from:"Sahil Takiar \(JIRA\)"

[jira] [Assigned] (HIVE-20273) Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo

2020-10-23 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-20273:
---

Assignee: (was: Sahil Takiar)

> Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo
> 
>
> Key: HIVE-20273
> URL: https://issues.apache.org/jira/browse/HIVE-20273
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20273.1.patch, HIVE-20273.2.patch
>
>
> HIVE-19053 and HIVE-19733 added handling of {{InterruptedException}} to 
> {{RemoteSparkJobStatus#getSparkJobInfo}} and 
> {{RemoteSparkJobStatus#getSparkStagesInfo}}. Now, these methods catch 
> {{InterruptedException}} and wrap the exception in a {{HiveException}} and 
> then throw the new {{HiveException}}.
> This new {{HiveException}} is then caught in 
> {{RemoteSparkJobMonitor#startMonitor}} which then looks for exceptions that 
> match the condition:
> {code:java}
> if (e instanceof InterruptedException ||
> (e instanceof HiveException && e.getCause() instanceof 
> InterruptedException))
> {code}
> If this condition is met (in this case it is), the exception will again be 
> wrapped in another {{HiveException}} and is thrown again. So the final 
> exception is a {{HiveException}} that wraps a {{HiveException}} that wraps an 
> {{InterruptedException}}.
> The double nesting of hive exception causes the logic in 
> {{SparkTask#setSparkException}} to break, and doesn't cause {{killJob}} to 
> get triggered.
> This causes interrupted Hive queries to not kill their corresponding Spark 
> jobs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-20519) Remove 30m min value for hive.spark.session.timeout

2020-10-23 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-20519:
---

Assignee: (was: Sahil Takiar)

> Remove 30m min value for hive.spark.session.timeout
> ---
>
> Key: HIVE-20519
> URL: https://issues.apache.org/jira/browse/HIVE-20519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20519.1.patch, HIVE-20519.2.patch, 
> HIVE-20519.3.patch
>
>
> In HIVE-14162 we added the config \{{hive.spark.session.timeout}} which 
> provided a way to time out Spark sessions that are active for a long period 
> of time. The config has a lower bound of 30m which we should remove. It 
> should be possible for users to configure this value so the HoS session is 
> closed as soon as the query is complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-20828) Upgrade to Spark 2.4.0

2020-10-23 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-20828:
---

Assignee: (was: Sahil Takiar)

> Upgrade to Spark 2.4.0
> --
>
> Key: HIVE-20828
> URL: https://issues.apache.org/jira/browse/HIVE-20828
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20828.1.patch, HIVE-20828.2.patch
>
>
> The Spark community is in the process of releasing Spark 2.4.0. We should do 
> some testing with the RC candidates and then upgrade once the release is 
> finalized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-19821) Distributed HiveServer2

2020-10-23 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-19821:
---

Assignee: (was: Sahil Takiar)

> Distributed HiveServer2
> ---
>
> Key: HIVE-19821
> URL: https://issues.apache.org/jira/browse/HIVE-19821
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19821.1.WIP.patch, HIVE-19821.2.WIP.patch, 
> HIVE-19821_ Distributed HiveServer2.pdf
>
>
> HS2 deployments often hit OOM issues due to a number of factors: (1) too many 
> concurrent connections, (2) query that scan a large number of partitions have 
> to pull a lot of metadata into memory (e.g. a query reading thousands of 
> partitions requires loading thousands of partitions into memory), (3) very 
> large queries can take up a lot of heap space, especially during query 
> parsing. There are a number of other factors that cause HiveServer2 to run 
> out of memory, these are just some of the more commons ones.
> Distributed HS2 proposes to do all query parsing, compilation, planning, and 
> execution coordination inside a dedicated container. This should 
> significantly decrease memory pressure on HS2 and allow HS2 to scale to a 
> larger number of concurrent users.
> For HoS (and I think Hive-on-Tez) this just requires moving all query 
> compilation, planning, etc. inside the application master for the 
> corresponding Hive session.
> The main benefit here is isolation. A poorly written Hive query cannot bring 
> down an entire HiveServer2 instance and force all other queries to fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-14165) Remove Hive file listing during split computation

2020-02-10 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033713#comment-17033713
 ] 

Sahil Takiar commented on HIVE-14165:
-

Marking as unassigned as I am no longer working on this. IIRC this speedup only 
applies to very simply queries - e.g. select / project queries.

> Remove Hive file listing during split computation
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Priority: Major
> Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, 
> HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, 
> HIVE-14165.07.patch, HIVE-14165.patch
>
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's 
> FileInputFormat.java will list the files during split computation anyway to 
> determine their size. One way to remove this is to catch the 
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the 
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-14165) Remove Hive file listing during split computation

2020-02-10 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-14165:
---

Assignee: (was: Sahil Takiar)

> Remove Hive file listing during split computation
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Priority: Major
> Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, 
> HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, 
> HIVE-14165.07.patch, HIVE-14165.patch
>
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's 
> FileInputFormat.java will list the files during split computation anyway to 
> determine their size. One way to remove this is to catch the 
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the 
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-16295) Add support for using Hadoop's S3A OutputCommitter

2020-01-03 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-16295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007583#comment-17007583
 ] 

Sahil Takiar commented on HIVE-16295:
-

I'm no longer working on this, so marking as unassigned.

> Add support for using Hadoop's S3A OutputCommitter
> --
>
> Key: HIVE-16295
> URL: https://issues.apache.org/jira/browse/HIVE-16295
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: HIVE-16295.1.WIP.patch, HIVE-16295.2.WIP.patch, 
> HIVE-16295.3.WIP.patch, HIVE-16295.4.patch, HIVE-16295.5.patch, 
> HIVE-16295.6.patch, HIVE-16295.7.patch, HIVE-16295.8.patch, HIVE-16295.9.patch
>
>
> Hive doesn't have integration with Hadoop's {{OutputCommitter}}, it uses a 
> {{NullOutputCommitter}} and uses its own commit logic spread across 
> {{FileSinkOperator}}, {{MoveTask}}, and {{Hive}}.
> The Hadoop community is building an {{OutputCommitter}} that integrates with 
> S3Guard and does a safe, coordinate commit of data on S3 inside individual 
> tasks (HADOOP-13786). If Hive can integrate with this new {{OutputCommitter}} 
> there would be a lot of benefits to Hive-on-S3:
> * Data is only written once; directly committing data at a task level means 
> no renames are necessary
> * The commit is done safely, in a coordinated manner; duplicate tasks (from 
> task retries or speculative execution) should not step on each other



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-16295) Add support for using Hadoop's S3A OutputCommitter

2020-01-03 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-16295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16295:
---

Assignee: (was: Sahil Takiar)

> Add support for using Hadoop's S3A OutputCommitter
> --
>
> Key: HIVE-16295
> URL: https://issues.apache.org/jira/browse/HIVE-16295
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: HIVE-16295.1.WIP.patch, HIVE-16295.2.WIP.patch, 
> HIVE-16295.3.WIP.patch, HIVE-16295.4.patch, HIVE-16295.5.patch, 
> HIVE-16295.6.patch, HIVE-16295.7.patch, HIVE-16295.8.patch, HIVE-16295.9.patch
>
>
> Hive doesn't have integration with Hadoop's {{OutputCommitter}}, it uses a 
> {{NullOutputCommitter}} and uses its own commit logic spread across 
> {{FileSinkOperator}}, {{MoveTask}}, and {{Hive}}.
> The Hadoop community is building an {{OutputCommitter}} that integrates with 
> S3Guard and does a safe, coordinate commit of data on S3 inside individual 
> tasks (HADOOP-13786). If Hive can integrate with this new {{OutputCommitter}} 
> there would be a lot of benefits to Hive-on-S3:
> * Data is only written once; directly committing data at a task level means 
> no renames are necessary
> * The commit is done safely, in a coordinated manner; duplicate tasks (from 
> task retries or speculative execution) should not step on each other



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-20079) Populate more accurate rawDataSize for parquet format

2019-02-20 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773098#comment-16773098
 ] 

Sahil Takiar commented on HIVE-20079:
-

How does ORC handle this? Is there a fundamental reason we can't mimic the same 
thing they are doing? Getting things to be consistent with how ORC handles this 
makes more sense to me than implementing two different approaches for ORC vs. 
Parquet and ending up with an inconsistent definition of {{rawDataSize}} 
depending on the file format. Sure, this patch is probably a better estimation 
so I see no reason to not proceed with it.

> Populate more accurate rawDataSize for parquet format
> -
>
> Key: HIVE-20079
> URL: https://issues.apache.org/jira/browse/HIVE-20079
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-20079.1.patch, HIVE-20079.2.patch, 
> HIVE-20079.3.patch
>
>
> Run the following queries and you will see the raw data for the table is 4 
> (that is the number of fields) incorrectly. We need to populate correct data 
> size so data can be split properly.
> {noformat}
> SET hive.stats.autogather=true;
> CREATE TABLE parquet_stats (id int,str string) STORED AS PARQUET;
> INSERT INTO parquet_stats values(0, 'this is string 0'), (1, 'string 1');
> DESC FORMATTED parquet_stats;
> {noformat}
> {noformat}
> Table Parameters:
>   COLUMN_STATS_ACCURATE   true
>   numFiles1
>   numRows 2
>   rawDataSize 4
>   totalSize   373
>   transient_lastDdlTime   1530660523
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20079) Populate more accurate rawDataSize for parquet format

2019-02-20 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773046#comment-16773046
 ] 

Sahil Takiar commented on HIVE-20079:
-

FYI I don't think {{block.getTotalByteSize}} provides the size of the data when 
loaded into memory. Talking to a few Parquet folks, no such method to get the 
raw data size exists. If we want to implement this patch we will have to do 
something similar to what ORC does - 
https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L601

> Populate more accurate rawDataSize for parquet format
> -
>
> Key: HIVE-20079
> URL: https://issues.apache.org/jira/browse/HIVE-20079
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-20079.1.patch, HIVE-20079.2.patch, 
> HIVE-20079.3.patch
>
>
> Run the following queries and you will see the raw data for the table is 4 
> (that is the number of fields) incorrectly. We need to populate correct data 
> size so data can be split properly.
> {noformat}
> SET hive.stats.autogather=true;
> CREATE TABLE parquet_stats (id int,str string) STORED AS PARQUET;
> INSERT INTO parquet_stats values(0, 'this is string 0'), (1, 'string 1');
> DESC FORMATTED parquet_stats;
> {noformat}
> {noformat}
> Table Parameters:
>   COLUMN_STATS_ACCURATE   true
>   numFiles1
>   numRows 2
>   rawDataSize 4
>   totalSize   373
>   transient_lastDdlTime   1530660523
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20125) Typo in MetricsCollection for OutputMetrics

2018-12-06 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711806#comment-16711806
 ] 

Sahil Takiar commented on HIVE-20125:
-

+1 LGTM

> Typo in MetricsCollection for OutputMetrics
> ---
>
> Key: HIVE-20125
> URL: https://issues.apache.org/jira/browse/HIVE-20125
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Adesh Kumar Rao
>Priority: Major
> Attachments: HIVE-20125.1.patch
>
>
> When creating {{OutputMetrics}} in the {{aggregate}} method we check for 
> {{hasInputMetrics}} instead of {{hasOutputMetrics}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20440) Create better cache eviction policy for SmallTableCache

2018-12-03 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707458#comment-16707458
 ] 

Sahil Takiar commented on HIVE-20440:
-

+1 LGTM

> Create better cache eviction policy for SmallTableCache
> ---
>
> Key: HIVE-20440
> URL: https://issues.apache.org/jira/browse/HIVE-20440
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-20440.01.patch, HIVE-20440.02.patch, 
> HIVE-20440.03.patch, HIVE-20440.04.patch, HIVE-20440.05.patch, 
> HIVE-20440.06.patch, HIVE-20440.07.patch, HIVE-20440.08.patch, 
> HIVE-20440.09.patch, HIVE-20440.10.patch, HIVE-20440.11.patch, 
> HIVE-20440.12.patch, HIVE-20440.13.patch, HIVE-20440.14.patch.txt, 
> HIVE-20440.15.patch
>
>
> Enhance the SmallTableCache, to use guava cache with soft references, so that 
> we evict when there is memory pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20828) Upgrade to Spark 2.4.0

2018-11-27 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20828:

Attachment: HIVE-20828.2.patch

> Upgrade to Spark 2.4.0
> --
>
> Key: HIVE-20828
> URL: https://issues.apache.org/jira/browse/HIVE-20828
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20828.1.patch, HIVE-20828.2.patch
>
>
> The Spark community is in the process of releasing Spark 2.4.0. We should do 
> some testing with the RC candidates and then upgrade once the release is 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20969) HoS sessionId generation can cause race conditions when uploading files to HDFS

2018-11-26 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699412#comment-16699412
 ] 

Sahil Takiar commented on HIVE-20969:
-

+1 LGTM pending tests.

> HoS sessionId generation can cause race conditions when uploading files to 
> HDFS
> ---
>
> Key: HIVE-20969
> URL: https://issues.apache.org/jira/browse/HIVE-20969
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 4.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-20969.2.patch, HIVE-20969.patch
>
>
> The observed exception is:
> {code}
> Caused by: java.io.FileNotFoundException: File does not exist: 
> /tmp/hive/_spark_session_dir/0/hive-exec-2.1.1-SNAPSHOT.jar (inode 21140) 
> [Lease.  Holder: DFSClient_NONMAPREDUCE_304217459_39, pending creates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2781)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2660)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20969) HoS sessionId generation can cause race conditions when uploading files to HDFS

2018-11-26 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699343#comment-16699343
 ] 

Sahil Takiar commented on HIVE-20969:
-

The intention of HIVE-19008 was so simplify the session id logic in HoS. Before 
HIVE-19008, the HoS session id was a UUID that was completely independent of 
the session id. After HIVE-19008, the HoS session id is a counter that is 
incremented for each each new Spark session created for a given Hive session.

{quote} I would assume that it would be good to connect the spark session to 
the hive session in every log message so it would be good if the sparkSessionId 
would contain the hive session id too. \{quote}

Adding the hive session id into the spark session id sounds like a reasonable 
idea to me. Logically, that is what HIVE-19008 already does. After HIVE-19008, 
any spark session id is globally identifiable by the Hive session id + Spark 
session id. Again, prior to HIVE-19008 the sparkSessionId was a UUID that was 
independent of the hive session id.

> HoS sessionId generation can cause race conditions when uploading files to 
> HDFS
> ---
>
> Key: HIVE-20969
> URL: https://issues.apache.org/jira/browse/HIVE-20969
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 4.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> The observed exception is:
> {code}
> Caused by: java.io.FileNotFoundException: File does not exist: 
> /tmp/hive/_spark_session_dir/0/hive-exec-2.1.1-SNAPSHOT.jar (inode 21140) 
> [Lease.  Holder: DFSClient_NONMAPREDUCE_304217459_39, pending creates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2781)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2660)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-13 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20512:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, 
> HIVE-20512.9.patch, HIVE-20512.91.patch, HIVE-20512.92.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-08 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680044#comment-16680044
 ] 

Sahil Takiar commented on HIVE-20512:
-

+1 pending tests.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, HIVE-20512.9.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-05 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675889#comment-16675889
 ] 

Sahil Takiar commented on HIVE-20512:
-

I'm not sure why {{awaitTermination}} would be causing the tests to timeout. Do 
they hang locally? If not, it could have just been a temporary test infra 
issue. The problem with calling {{shutdownNow}} directly is that is cancels any 
in progress tasks by interrupting any in progress threads. This can lead to 
spurious errors in the task logs, which can be confusing. It's generally 
recommended to follow the shutdown pattern outlined in 
[https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html]

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, HIVE-20512.6.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20079) Populate more accurate rawDataSize for parquet format

2018-11-05 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-20079:
---

Assignee: (was: Sahil Takiar)

> Populate more accurate rawDataSize for parquet format
> -
>
> Key: HIVE-20079
> URL: https://issues.apache.org/jira/browse/HIVE-20079
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Priority: Major
> Attachments: HIVE-20079.1.patch, HIVE-20079.2.patch
>
>
> Run the following queries and you will see the raw data for the table is 4 
> (that is the number of fields) incorrectly. We need to populate correct data 
> size so data can be split properly.
> {noformat}
> SET hive.stats.autogather=true;
> CREATE TABLE parquet_stats (id int,str string) STORED AS PARQUET;
> INSERT INTO parquet_stats values(0, 'this is string 0'), (1, 'string 1');
> DESC FORMATTED parquet_stats;
> {noformat}
> {noformat}
> Table Parameters:
>   COLUMN_STATS_ACCURATE   true
>   numFiles1
>   numRows 2
>   rawDataSize 4
>   totalSize   373
>   transient_lastDdlTime   1530660523
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20828) Upgrade to Spark 2.4.0

2018-10-30 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20828:

Attachment: HIVE-20828.1.patch

> Upgrade to Spark 2.4.0
> --
>
> Key: HIVE-20828
> URL: https://issues.apache.org/jira/browse/HIVE-20828
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20828.1.patch
>
>
> The Spark community is in the process of releasing Spark 2.4.0. We should do 
> some testing with the RC candidates and then upgrade once the release is 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20828) Upgrade to Spark 2.4.0

2018-10-30 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20828:

Status: Patch Available  (was: Open)

> Upgrade to Spark 2.4.0
> --
>
> Key: HIVE-20828
> URL: https://issues.apache.org/jira/browse/HIVE-20828
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20828.1.patch
>
>
> The Spark community is in the process of releasing Spark 2.4.0. We should do 
> some testing with the RC candidates and then upgrade once the release is 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20828) Upgrade to Spark 2.4.0

2018-10-29 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20828:

Description: The Spark community is in the process of releasing Spark 
2.4.0. We should do some testing with the RC candidates and then upgrade once 
the release is finalized.  (was: Spark is in the process of releasing Spark 
2.4.0. We should do something testing with the RC candidates and then upgrade 
once the release is finalized.)

> Upgrade to Spark 2.4.0
> --
>
> Key: HIVE-20828
> URL: https://issues.apache.org/jira/browse/HIVE-20828
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The Spark community is in the process of releasing Spark 2.4.0. We should do 
> some testing with the RC candidates and then upgrade once the release is 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20828) Upgrade to Spark 2.4.0

2018-10-29 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-20828:
---


> Upgrade to Spark 2.4.0
> --
>
> Key: HIVE-20828
> URL: https://issues.apache.org/jira/browse/HIVE-20828
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Spark is in the process of releasing Spark 2.4.0. We should do something 
> testing with the RC candidates and then upgrade once the release is finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-29 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667403#comment-16667403
 ] 

Sahil Takiar commented on HIVE-20512:
-

+1 are the test failures related? Given there are no new unit tests for this 
patch, I'm assuming it was at least manually verified to work?

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20273) Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo

2018-10-24 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20273:

Attachment: HIVE-20273.2.patch

> Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo
> 
>
> Key: HIVE-20273
> URL: https://issues.apache.org/jira/browse/HIVE-20273
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20273.1.patch, HIVE-20273.2.patch
>
>
> HIVE-19053 and HIVE-19733 added handling of {{InterruptedException}} to 
> {{RemoteSparkJobStatus#getSparkJobInfo}} and 
> {{RemoteSparkJobStatus#getSparkStagesInfo}}. Now, these methods catch 
> {{InterruptedException}} and wrap the exception in a {{HiveException}} and 
> then throw the new {{HiveException}}.
> This new {{HiveException}} is then caught in 
> {{RemoteSparkJobMonitor#startMonitor}} which then looks for exceptions that 
> match the condition:
> {code:java}
> if (e instanceof InterruptedException ||
> (e instanceof HiveException && e.getCause() instanceof 
> InterruptedException))
> {code}
> If this condition is met (in this case it is), the exception will again be 
> wrapped in another {{HiveException}} and is thrown again. So the final 
> exception is a {{HiveException}} that wraps a {{HiveException}} that wraps an 
> {{InterruptedException}}.
> The double nesting of hive exception causes the logic in 
> {{SparkTask#setSparkException}} to break, and doesn't cause {{killJob}} to 
> get triggered.
> This causes interrupted Hive queries to not kill their corresponding Spark 
> jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20519) Remove 30m min value for hive.spark.session.timeout

2018-10-24 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20519:

Attachment: HIVE-20519.3.patch

> Remove 30m min value for hive.spark.session.timeout
> ---
>
> Key: HIVE-20519
> URL: https://issues.apache.org/jira/browse/HIVE-20519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20519.1.patch, HIVE-20519.2.patch, 
> HIVE-20519.3.patch
>
>
> In HIVE-14162 we added the config \{{hive.spark.session.timeout}} which 
> provided a way to time out Spark sessions that are active for a long period 
> of time. The config has a lower bound of 30m which we should remove. It 
> should be possible for users to configure this value so the HoS session is 
> closed as soon as the query is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-20790) SparkSession should be able to close a session while it is being opened

2018-10-24 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662444#comment-16662444
 ] 

Sahil Takiar edited comment on HIVE-20790 at 10/24/18 3:44 PM:
---

Yet another edge to consider, guarding against multiple threads calling 
{{open()}} in parallel. HIVE-20737 fixed this issue by re-using the 
{{closeLock()}} to synchronize calls to the {{open()}} method. However, as I 
stated in the RB comments of HIVE-20737 we should use a separate lock to guard 
against this behavior.

We should add some unit tests for this scenario as well.


was (Author: stakiar):
Yet another edge to consider, guarding against multiple threads calling 
{{open()}} in parallel. HIVE-20737 fixed this issue by re-using the 
{{closeLock()}} to synchronize calls to the {{open()}} method. However, as I 
stated in the RB comments of HIVE-20737 we should use a separate lock to guard 
against this behavior.

> SparkSession should be able to close a session while it is being opened
> ---
>
> Key: HIVE-20790
> URL: https://issues.apache.org/jira/browse/HIVE-20790
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Antal Sinkovits
>Priority: Major
>
> In HIVE-14162 we adding locks to {{SparkSessionImpl}} to support scenarios 
> where we want to close the session due to a timeout. However, the locks 
> remove the ability to close a session while it is being opened. This is 
> important to allow cancelling of a session while it is being setup. This can 
> be useful on busy clusters where there may not be enough YARN containers to 
> setup the Spark Remote Driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20790) SparkSession should be able to close a session while it is being opened

2018-10-24 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662444#comment-16662444
 ] 

Sahil Takiar commented on HIVE-20790:
-

Yet another edge to consider, guarding against multiple threads calling 
{{open()}} in parallel. HIVE-20737 fixed this issue by re-using the 
{{closeLock()}} to synchronize calls to the {{open()}} method. However, as I 
stated in the RB comments of HIVE-20737 we should use a separate lock to guard 
against this behavior.

> SparkSession should be able to close a session while it is being opened
> ---
>
> Key: HIVE-20790
> URL: https://issues.apache.org/jira/browse/HIVE-20790
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Antal Sinkovits
>Priority: Major
>
> In HIVE-14162 we adding locks to {{SparkSessionImpl}} to support scenarios 
> where we want to close the session due to a timeout. However, the locks 
> remove the ability to close a session while it is being opened. This is 
> important to allow cancelling of a session while it is being setup. This can 
> be useful on busy clusters where there may not be enough YARN containers to 
> setup the Spark Remote Driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20790) SparkSession should be able to close a session while it is being opened

2018-10-24 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662428#comment-16662428
 ] 

Sahil Takiar commented on HIVE-20790:
-

As a follow up to HIVE-20737, we also want to support: "Make getting an opened 
Spark Session + submitting a Spark job an atomic operation". This avoids the 
edge case where a user tries to submit a query while the session is being 
closed. There is a race condition where the code might see that isOpen is true, 
but when it tries to submit the query the session has already been closed by 
another thread. This will cause the submission to fail with an 
{{IllegalStateException}}. Making the check of isOpen and the submission atomic 
avoids this scenario. Essentially, we want to support valid semantics when a 
user submits a query while the session is being closed. So if a user submits a 
query while a job is being closed, it waits for the close to complete and then 
re-opens the session and submits the job.

> SparkSession should be able to close a session while it is being opened
> ---
>
> Key: HIVE-20790
> URL: https://issues.apache.org/jira/browse/HIVE-20790
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Antal Sinkovits
>Priority: Major
>
> In HIVE-14162 we adding locks to {{SparkSessionImpl}} to support scenarios 
> where we want to close the session due to a timeout. However, the locks 
> remove the ability to close a session while it is being opened. This is 
> important to allow cancelling of a session while it is being setup. This can 
> be useful on busy clusters where there may not be enough YARN containers to 
> setup the Spark Remote Driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20737) Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-23 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661206#comment-16661206
 ] 

Sahil Takiar commented on HIVE-20737:
-

Filed HIVE-20790 to fix the issue where we want to call close while open is 
running.

> Local SparkContext is shared between user sessions and should be closed only 
> when there is no active
> 
>
> Key: HIVE-20737
> URL: https://issues.apache.org/jira/browse/HIVE-20737
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-20737.1.patch, HIVE-20737.10.patch, 
> HIVE-20737.11.patch, HIVE-20737.12.patch, HIVE-20737.2.patch, 
> HIVE-20737.5.patch, HIVE-20737.6.patch, HIVE-20737.7.patch, 
> HIVE-20737.8.patch, HIVE-20737.9.patch
>
>
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
>  2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20519) Remove 30m min value for hive.spark.session.timeout

2018-10-23 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661203#comment-16661203
 ] 

Sahil Takiar commented on HIVE-20519:
-

Aforementioned issue will be fixed in HIVE-20790

> Remove 30m min value for hive.spark.session.timeout
> ---
>
> Key: HIVE-20519
> URL: https://issues.apache.org/jira/browse/HIVE-20519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20519.1.patch, HIVE-20519.2.patch
>
>
> In HIVE-14162 we added the config \{{hive.spark.session.timeout}} which 
> provided a way to time out Spark sessions that are active for a long period 
> of time. The config has a lower bound of 30m which we should remove. It 
> should be possible for users to configure this value so the HoS session is 
> closed as soon as the query is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20488) SparkSubmitSparkClient#launchDriver should parse exceptions, not just errors

2018-10-18 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20488:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

> SparkSubmitSparkClient#launchDriver should parse exceptions, not just errors
> 
>
> Key: HIVE-20488
> URL: https://issues.apache.org/jira/browse/HIVE-20488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20488.1.patch
>
>
> In {{SparkSubmitSparkClient#launchDriver}} we parse the stdout / stderr of 
> {{bin/spark-submit}} for strings that contain "Error", but we should also 
> look for "Exception".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20488) SparkSubmitSparkClient#launchDriver should parse exceptions, not just errors

2018-10-18 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655467#comment-16655467
 ] 

Sahil Takiar commented on HIVE-20488:
-

Pushed to master.

> SparkSubmitSparkClient#launchDriver should parse exceptions, not just errors
> 
>
> Key: HIVE-20488
> URL: https://issues.apache.org/jira/browse/HIVE-20488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20488.1.patch
>
>
> In {{SparkSubmitSparkClient#launchDriver}} we parse the stdout / stderr of 
> {{bin/spark-submit}} for strings that contain "Error", but we should also 
> look for "Exception".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20737) Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-18 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655464#comment-16655464
 ] 

Sahil Takiar commented on HIVE-20737:
-

Making getting an opened Spark Session + submitting a Spark job an atomic 
operation makes sense to me.

> Local SparkContext is shared between user sessions and should be closed only 
> when there is no active
> 
>
> Key: HIVE-20737
> URL: https://issues.apache.org/jira/browse/HIVE-20737
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-20737.1.patch, HIVE-20737.10.patch, 
> HIVE-20737.11.patch, HIVE-20737.12.patch, HIVE-20737.2.patch, 
> HIVE-20737.5.patch, HIVE-20737.6.patch, HIVE-20737.7.patch, 
> HIVE-20737.8.patch, HIVE-20737.9.patch
>
>
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
>  2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-20737) Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-18 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655458#comment-16655458
 ] 

Sahil Takiar edited comment on HIVE-20737 at 10/18/18 3:49 PM:
---

{quote}Opening of a Spark session and Job submission should be done as an 
atomic operation.
{quote}
Well we intentionally don't do that. HoS by design de-couples opening a Spark 
session and submitting a job. There are a few reasons:

(1) In order to support static allocation in Spark, we have to open a session 
before we even have a job to submit (e.g. see {{SetSparkReducerParallelism}})

(2) At some point I think we should implement HIVE-17927; the reason is that 
opening a Spark session causes a Spark application to be created, which 
requires resource negotiation with YARN and the spawning of the Spark driver, 
which takes a non-trivial amount of time
{quote}Not to have case when we submit something having already closed session
{quote}
We could just re-open the session if we try to submit a job on a closed session.


was (Author: stakiar):
{quote} Opening of a Spark session and Job submission should be done as an 
atomic operation. {quote}

Well we intentionally don't do that. HoS by design de-couples opening a Spark 
session and submitting a job. There are a few reasons:
(1) In order to support static allocation in Spark, we have to open a session 
before we even have a job to submit (e.g. see {{SetSparkReducerParallelism}})
(2) At some point I think we should implement HIVE-17927; the reason is that 
opening a Spark session causes a Spark application to be created, which 
requires resource negotiation with YARN and the spawning of the Spark driver, 
which takes a non-trivial amount of time

{quote} Not to have case when we submit something having already closed session 
{quote}

We could just re-open the session if we try to submit a job on a closed session.

> Local SparkContext is shared between user sessions and should be closed only 
> when there is no active
> 
>
> Key: HIVE-20737
> URL: https://issues.apache.org/jira/browse/HIVE-20737
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-20737.1.patch, HIVE-20737.10.patch, 
> HIVE-20737.11.patch, HIVE-20737.12.patch, HIVE-20737.2.patch, 
> HIVE-20737.5.patch, HIVE-20737.6.patch, HIVE-20737.7.patch, 
> HIVE-20737.8.patch, HIVE-20737.9.patch
>
>
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
>  2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20737) Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-18 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655458#comment-16655458
 ] 

Sahil Takiar commented on HIVE-20737:
-

{quote} Opening of a Spark session and Job submission should be done as an 
atomic operation. {quote}

Well we intentionally don't do that. HoS by design de-couples opening a Spark 
session and submitting a job. There are a few reasons:
(1) In order to support static allocation in Spark, we have to open a session 
before we even have a job to submit (e.g. see {{SetSparkReducerParallelism}})
(2) At some point I think we should implement HIVE-17927; the reason is that 
opening a Spark session causes a Spark application to be created, which 
requires resource negotiation with YARN and the spawning of the Spark driver, 
which takes a non-trivial amount of time

{quote} Not to have case when we submit something having already closed session 
{quote}

We could just re-open the session if we try to submit a job on a closed session.

> Local SparkContext is shared between user sessions and should be closed only 
> when there is no active
> 
>
> Key: HIVE-20737
> URL: https://issues.apache.org/jira/browse/HIVE-20737
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-20737.1.patch, HIVE-20737.10.patch, 
> HIVE-20737.11.patch, HIVE-20737.12.patch, HIVE-20737.2.patch, 
> HIVE-20737.5.patch, HIVE-20737.6.patch, HIVE-20737.7.patch, 
> HIVE-20737.8.patch, HIVE-20737.9.patch
>
>
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
>  2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-18 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655439#comment-16655439
 ] 

Sahil Takiar commented on HIVE-20512:
-

A few comments:
(1) I think the logging should be done in a separate thread so that we don't 
have to invoke {{logMemoryInfo()}} for each record, which can add significant 
overhead to per-record processing
(2) I think we should start with a lower interval, something like 15 seconds

You could try and add a unit test that logs to a string buffers, and then parse 
that string buffer in a unit test. However, I don't think its necessary.

CC: [~asinkovits]

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20737) Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-18 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655398#comment-16655398
 ] 

Sahil Takiar commented on HIVE-20737:
-

{quote} As the separate JIRA we are planning to refactor SparkSession 
open/submit functionality to be an atomic unit. {quote} Can you explain what 
this means?

> Local SparkContext is shared between user sessions and should be closed only 
> when there is no active
> 
>
> Key: HIVE-20737
> URL: https://issues.apache.org/jira/browse/HIVE-20737
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-20737.1.patch, HIVE-20737.10.patch, 
> HIVE-20737.11.patch, HIVE-20737.12.patch, HIVE-20737.2.patch, 
> HIVE-20737.5.patch, HIVE-20737.6.patch, HIVE-20737.7.patch, 
> HIVE-20737.8.patch, HIVE-20737.9.patch
>
>
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
>  2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20488) SparkSubmitSparkClient#launchDriver should parse exceptions, not just errors

2018-10-16 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651746#comment-16651746
 ] 

Sahil Takiar commented on HIVE-20488:
-

+1 LGTM pending a clean run from Hive QA.

> SparkSubmitSparkClient#launchDriver should parse exceptions, not just errors
> 
>
> Key: HIVE-20488
> URL: https://issues.apache.org/jira/browse/HIVE-20488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20488.1.patch
>
>
> In {{SparkSubmitSparkClient#launchDriver}} we parse the stdout / stderr of 
> {{bin/spark-submit}} for strings that contain "Error", but we should also 
> look for "Exception".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20737) Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-16 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651741#comment-16651741
 ] 

Sahil Takiar commented on HIVE-20737:
-

[~dkuzmenko] left a few comments on the RB.

> Local SparkContext is shared between user sessions and should be closed only 
> when there is no active
> 
>
> Key: HIVE-20737
> URL: https://issues.apache.org/jira/browse/HIVE-20737
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-20737.1.patch, HIVE-20737.2.patch, 
> HIVE-20737.5.patch, HIVE-20737.6.patch, HIVE-20737.7.patch, 
> HIVE-20737.8.patch, HIVE-20737.9.patch
>
>
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
>  2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20742) SparkSessionManagerImpl maintenance thread only cleans up session once

2018-10-16 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651736#comment-16651736
 ] 

Sahil Takiar commented on HIVE-20742:
-

Nice catch. Overall, LGTM. I think it would be good to add some tests that 
regular Hive session timeouts cause the Spark session to be closed in this 
scenario. Adding some testing to {{SparkSessionManagerImpl}} {{#shutdown}} 
{{#closeSession}} would be good too.

> SparkSessionManagerImpl maintenance thread only cleans up session once
> --
>
> Key: HIVE-20742
> URL: https://issues.apache.org/jira/browse/HIVE-20742
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-20742.1.patch, HIVE-20742.2.patch
>
>
> If there is a reconnect at the client session, the SparkSessionManagerImpl 
> doesn't puts it back in the created sessions, so it will not time out the 
> second time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20519) Remove 30m min value for hive.spark.session.timeout

2018-10-16 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651727#comment-16651727
 ] 

Sahil Takiar commented on HIVE-20519:
-

Another issue that needs to be fixed with this feature is that we should 
support closing a SparkSession while it is being opened. The old code was 
capable of doing that. Supporting this is important for query cancellation, 
especially on a busy cluster where it might take a while to start the Spark 
driver.

> Remove 30m min value for hive.spark.session.timeout
> ---
>
> Key: HIVE-20519
> URL: https://issues.apache.org/jira/browse/HIVE-20519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20519.1.patch, HIVE-20519.2.patch
>
>
> In HIVE-14162 we added the config \{{hive.spark.session.timeout}} which 
> provided a way to time out Spark sessions that are active for a long period 
> of time. The config has a lower bound of 30m which we should remove. It 
> should be possible for users to configure this value so the HoS session is 
> closed as soon as the query is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-12 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648296#comment-16648296
 ] 

Sahil Takiar commented on HIVE-20512:
-

I mean time interval. So log the # of rows processed every few minutes (or 
something like that). Perhaps an exponentially increasing interval would work 
best so the logs don't explode.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Priority: Major
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20737) LocalHiveSparkClient and SparkSession race condition fix

2018-10-12 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648287#comment-16648287
 ] 

Sahil Takiar commented on HIVE-20737:
-

Can you describe the race condition?

> LocalHiveSparkClient and SparkSession race condition fix
> 
>
> Key: HIVE-20737
> URL: https://issues.apache.org/jira/browse/HIVE-20737
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-20737.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-25 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17684:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks [~mi...@cloudera.com] for the contribution!

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch, HIVE-17684.07.patch, HIVE-17684.08.patch, 
> HIVE-17684.09.patch, HIVE-17684.10.patch, HIVE-17684.11.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-25 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627851#comment-16627851
 ] 

Sahil Takiar commented on HIVE-17684:
-

+1 latest patch LGTM

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch, HIVE-17684.07.patch, HIVE-17684.08.patch, 
> HIVE-17684.09.patch, HIVE-17684.10.patch, HIVE-17684.11.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-20 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17684:

Attachment: HIVE-17684.11.patch

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch, HIVE-17684.07.patch, HIVE-17684.08.patch, 
> HIVE-17684.09.patch, HIVE-17684.10.patch, HIVE-17684.11.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-20 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622731#comment-16622731
 ] 

Sahil Takiar commented on HIVE-17684:
-

[~mi...@cloudera.com] I'm not entirely sure whats going on with the compilation 
failures. They don't seem related to the patch, so I think its safe to ignore 
them for now. I will look into them separately.

Hive still seems to be able to compile the code successfully given that it is 
able to run unit tests.

I'm looking it why the HoS tests are failing.

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch, HIVE-17684.07.patch, HIVE-17684.08.patch, 
> HIVE-17684.09.patch, HIVE-17684.10.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20079) Populate more accurate rawDataSize for parquet format

2018-09-14 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615173#comment-16615173
 ] 

Sahil Takiar commented on HIVE-20079:
-

[~aihuaxu] not sure if you are still planning to work on this? If not, mind if 
I assign it to myself?

> Populate more accurate rawDataSize for parquet format
> -
>
> Key: HIVE-20079
> URL: https://issues.apache.org/jira/browse/HIVE-20079
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20079.1.patch, HIVE-20079.2.patch
>
>
> Run the following queries and you will see the raw data for the table is 4 
> (that is the number of fields) incorrectly. We need to populate correct data 
> size so data can be split properly.
> {noformat}
> SET hive.stats.autogather=true;
> CREATE TABLE parquet_stats (id int,str string) STORED AS PARQUET;
> INSERT INTO parquet_stats values(0, 'this is string 0'), (1, 'string 1');
> DESC FORMATTED parquet_stats;
> {noformat}
> {noformat}
> Table Parameters:
>   COLUMN_STATS_ACCURATE   true
>   numFiles1
>   numRows 2
>   rawDataSize 4
>   totalSize   373
>   transient_lastDdlTime   1530660523
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20079) Populate more accurate rawDataSize for parquet format

2018-09-14 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-20079:
---

Assignee: Sahil Takiar  (was: Aihua Xu)

> Populate more accurate rawDataSize for parquet format
> -
>
> Key: HIVE-20079
> URL: https://issues.apache.org/jira/browse/HIVE-20079
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20079.1.patch, HIVE-20079.2.patch
>
>
> Run the following queries and you will see the raw data for the table is 4 
> (that is the number of fields) incorrectly. We need to populate correct data 
> size so data can be split properly.
> {noformat}
> SET hive.stats.autogather=true;
> CREATE TABLE parquet_stats (id int,str string) STORED AS PARQUET;
> INSERT INTO parquet_stats values(0, 'this is string 0'), (1, 'string 1');
> DESC FORMATTED parquet_stats;
> {noformat}
> {noformat}
> Table Parameters:
>   COLUMN_STATS_ACCURATE   true
>   numFiles1
>   numRows 2
>   rawDataSize 4
>   totalSize   373
>   transient_lastDdlTime   1530660523
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-14 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615134#comment-16615134
 ] 

Sahil Takiar commented on HIVE-17684:
-

We haven't seen any issues or complaints about the 
{{MapJoinMemoryExhaustionHandler}} for Hive-on-MR, only for HoS. So I don't 
think we should modify the check for HoMR unless we have more evidence that it 
affects HoMR users.

My guess is that this affects HoS more because in Spark a single JVM can run 
multiple Hive tasks and often runs multiple tasks in parallel. In HoMR, a new 
JVM is spawned for each Hive tasks; each JVM runs exactly one Hive task, and 
then shutdowns.

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch, HIVE-17684.07.patch, HIVE-17684.08.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20273) Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo

2018-09-14 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615124#comment-16615124
 ] 

Sahil Takiar commented on HIVE-20273:
-

This patch makes the following fixes:

* Fixes the "double-nesting" issue by removing the second clause of the if 
statement mentioned above
* Adds proper and consistent handling of interrupts to {{getWebUIURL}} and 
{{getAppID}} in {{RemoteSparkJobStatus}}
* Adds several unit tests that validate that {{killJob}} is invoked whenever an 
RPC call is interrupted

> Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo
> 
>
> Key: HIVE-20273
> URL: https://issues.apache.org/jira/browse/HIVE-20273
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20273.1.patch
>
>
> HIVE-19053 and HIVE-19733 added handling of {{InterruptedException}} to 
> {{RemoteSparkJobStatus#getSparkJobInfo}} and 
> {{RemoteSparkJobStatus#getSparkStagesInfo}}. Now, these methods catch 
> {{InterruptedException}} and wrap the exception in a {{HiveException}} and 
> then throw the new {{HiveException}}.
> This new {{HiveException}} is then caught in 
> {{RemoteSparkJobMonitor#startMonitor}} which then looks for exceptions that 
> match the condition:
> {code:java}
> if (e instanceof InterruptedException ||
> (e instanceof HiveException && e.getCause() instanceof 
> InterruptedException))
> {code}
> If this condition is met (in this case it is), the exception will again be 
> wrapped in another {{HiveException}} and is thrown again. So the final 
> exception is a {{HiveException}} that wraps a {{HiveException}} that wraps an 
> {{InterruptedException}}.
> The double nesting of hive exception causes the logic in 
> {{SparkTask#setSparkException}} to break, and doesn't cause {{killJob}} to 
> get triggered.
> This causes interrupted Hive queries to not kill their corresponding Spark 
> jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20273) Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo

2018-09-13 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20273:

Description: 
HIVE-19053 and HIVE-19733 added handling of {{InterruptedException}} to 
{{RemoteSparkJobStatus#getSparkJobInfo}} and 
{{RemoteSparkJobStatus#getSparkStagesInfo}}. Now, these methods catch 
{{InterruptedException}} and wrap the exception in a {{HiveException}} and then 
throw the new {{HiveException}}.

This new {{HiveException}} is then caught in 
{{RemoteSparkJobMonitor#startMonitor}} which then looks for exceptions that 
match the condition:
{code:java}
if (e instanceof InterruptedException ||
(e instanceof HiveException && e.getCause() instanceof 
InterruptedException))
{code}
If this condition is met (in this case it is), the exception will again be 
wrapped in another {{HiveException}} and is thrown again. So the final 
exception is a {{HiveException}} that wraps a {{HiveException}} that wraps an 
{{InterruptedException}}.

The double nesting of hive exception causes the logic in 
{{SparkTask#setSparkException}} to break, and doesn't cause {{killJob}} to get 
triggered.

This causes interrupted Hive queries to not kill their corresponding Spark jobs.

  was:
HIVE-19053 and HIVE-19733 added handling of {{InterruptedException}} to 
{{RemoteSparkJobStatus#getSparkJobInfo}} and 
{{RemoteSparkJobStatus#getSparkStagesInfo}}. These methods catch 
{{InterruptedException}} and wrap the exception in a {{HiveException}} and then 
throw the new {{HiveException}}.

This new {{HiveException}} is then caught in 
{{RemoteSparkJobMonitor#startMonitor}} which then looks for exceptions that 
match the condition:

{code}
if (e instanceof InterruptedException ||
(e instanceof HiveException && e.getCause() instanceof 
InterruptedException))
{code}

If this condition is met (in this case it is), the exception will again be 
wrapped in another {{HiveException}} and is thrown again. So the final 
exception is a {{HiveException}} that wraps a {{HiveException}} that wraps an 
{{InterruptedException}}.

The double nesting of hive exception causes the logic in 
{{SparkTask#setSparkException}} to break, and doesn't cause {{killJob}} to get 
triggered.


> Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo
> 
>
> Key: HIVE-20273
> URL: https://issues.apache.org/jira/browse/HIVE-20273
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20273.1.patch
>
>
> HIVE-19053 and HIVE-19733 added handling of {{InterruptedException}} to 
> {{RemoteSparkJobStatus#getSparkJobInfo}} and 
> {{RemoteSparkJobStatus#getSparkStagesInfo}}. Now, these methods catch 
> {{InterruptedException}} and wrap the exception in a {{HiveException}} and 
> then throw the new {{HiveException}}.
> This new {{HiveException}} is then caught in 
> {{RemoteSparkJobMonitor#startMonitor}} which then looks for exceptions that 
> match the condition:
> {code:java}
> if (e instanceof InterruptedException ||
> (e instanceof HiveException && e.getCause() instanceof 
> InterruptedException))
> {code}
> If this condition is met (in this case it is), the exception will again be 
> wrapped in another {{HiveException}} and is thrown again. So the final 
> exception is a {{HiveException}} that wraps a {{HiveException}} that wraps an 
> {{InterruptedException}}.
> The double nesting of hive exception causes the logic in 
> {{SparkTask#setSparkException}} to break, and doesn't cause {{killJob}} to 
> get triggered.
> This causes interrupted Hive queries to not kill their corresponding Spark 
> jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20273) Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo

2018-09-13 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20273:

Description: 
HIVE-19053 and HIVE-19733 added handling of {{InterruptedException}} to 
{{RemoteSparkJobStatus#getSparkJobInfo}} and 
{{RemoteSparkJobStatus#getSparkStagesInfo}}. These methods catch 
{{InterruptedException}} and wrap the exception in a {{HiveException}} and then 
throw the new {{HiveException}}.

This new {{HiveException}} is then caught in 
{{RemoteSparkJobMonitor#startMonitor}} which then looks for exceptions that 
match the condition:

{code}
if (e instanceof InterruptedException ||
(e instanceof HiveException && e.getCause() instanceof 
InterruptedException))
{code}

If this condition is met (in this case it is), the exception will again be 
wrapped in another {{HiveException}} and is thrown again. So the final 
exception is a {{HiveException}} that wraps a {{HiveException}} that wraps an 
{{InterruptedException}}.

The double nesting of hive exception causes the logic in 
{{SparkTask#setSparkException}} to break, and doesn't cause {{killJob}} to get 
triggered.

  was:HIVE-19053 and HIVE-19733 add handling of {{InterruptedException}} to 
{{#getSparkJobInfo}} and {{#getSparkStagesInfo}} in {{RemoteSparkJobStatus}}, 
but that means the {{InterruptedException}} is wrapped in a {{HiveException}} 
and then thrown. The {{HiveException}} is then cause in 
{{RemoteSparkJobMonitor}} and then wrapped in another Hive exception. The 
double nesting of hive exception causes the logic in 
{{SparkTask#setSparkException}} to break, and it doesn't kill the job if an 
interrupted exception is thrown.


> Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo
> 
>
> Key: HIVE-20273
> URL: https://issues.apache.org/jira/browse/HIVE-20273
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20273.1.patch
>
>
> HIVE-19053 and HIVE-19733 added handling of {{InterruptedException}} to 
> {{RemoteSparkJobStatus#getSparkJobInfo}} and 
> {{RemoteSparkJobStatus#getSparkStagesInfo}}. These methods catch 
> {{InterruptedException}} and wrap the exception in a {{HiveException}} and 
> then throw the new {{HiveException}}.
> This new {{HiveException}} is then caught in 
> {{RemoteSparkJobMonitor#startMonitor}} which then looks for exceptions that 
> match the condition:
> {code}
> if (e instanceof InterruptedException ||
> (e instanceof HiveException && e.getCause() instanceof 
> InterruptedException))
> {code}
> If this condition is met (in this case it is), the exception will again be 
> wrapped in another {{HiveException}} and is thrown again. So the final 
> exception is a {{HiveException}} that wraps a {{HiveException}} that wraps an 
> {{InterruptedException}}.
> The double nesting of hive exception causes the logic in 
> {{SparkTask#setSparkException}} to break, and doesn't cause {{killJob}} to 
> get triggered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20519) Remove 30m min value for hive.spark.session.timeout

2018-09-13 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20519:

Attachment: HIVE-20519.2.patch

> Remove 30m min value for hive.spark.session.timeout
> ---
>
> Key: HIVE-20519
> URL: https://issues.apache.org/jira/browse/HIVE-20519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20519.1.patch, HIVE-20519.2.patch
>
>
> In HIVE-14162 we added the config \{{hive.spark.session.timeout}} which 
> provided a way to time out Spark sessions that are active for a long period 
> of time. The config has a lower bound of 30m which we should remove. It 
> should be possible for users to configure this value so the HoS session is 
> closed as soon as the query is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-13 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614097#comment-16614097
 ] 

Sahil Takiar commented on HIVE-17684:
-

[~mi...@cloudera.com] I attached an updated path. I did a good amount of 
re-factoring to the patch. The main reason is that we only want these changes 
to apply to Hive-on-Spark, not Hive-on-MR (at least for now). The reason is 
that we are only seeing issues when running Hive-on-Spark, not Hive-on-MR.

The main changes are as follows:
* Created a new interface called {{MemoryExhaustionChecker}} which has two 
implementations:
** {{DefaultMemoryExhaustionChecker}} preserves the old logic - e.g. uses 
{{MapJoinMemoryExhaustionHandler}}
** {{SparkMemoryExhaustionChecker}} uses the new logic you added - e.g. 
{{GcTimeMonitor}}
* Depending on the execution engine, {{HashTableSinkOperator}} will use one of 
the above classes to check if memory has been exhausted
* I changed {{hive.mapjoin.max.gc.time.percentage}} to be a value between 0 and 
1 to make the config more consistent with the rest of Hive configs.

Let me know what you think. These new changes should also fix the test failures 
you were seeing.

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch, HIVE-17684.07.patch, HIVE-17684.08.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-13 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17684:

Attachment: HIVE-17684.08.patch

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch, HIVE-17684.07.patch, HIVE-17684.08.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20440) Create better cache eviction policy for SmallTableCache

2018-09-13 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613811#comment-16613811
 ] 

Sahil Takiar commented on HIVE-20440:
-

Makes sense to me.

> Create better cache eviction policy for SmallTableCache
> ---
>
> Key: HIVE-20440
> URL: https://issues.apache.org/jira/browse/HIVE-20440
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-20440.01.patch, HIVE-20440.02.patch, 
> HIVE-20440.03.patch, HIVE-20440.04.patch
>
>
> Enhance the SmallTableCache, to use guava cache with soft references, so that 
> we evict when there is memory pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19814) RPC Server port is always random for spark

2018-09-12 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-19814:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Bharath for the contribution!

> RPC Server port is always random for spark
> --
>
> Key: HIVE-19814
> URL: https://issues.apache.org/jira/browse/HIVE-19814
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.0, 3.0.0, 2.4.0, 4.0.0
>Reporter: bounkong khamphousone
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-19814.1.patch, HIVE-19814.2.patch, 
> HIVE-19814.3.patch
>
>
> RPC server port is always a random one. In fact, the problem is in 
> RpcConfiguration.HIVE_SPARK_RSC_CONFIGS which doesn't include 
> SPARK_RPC_SERVER_PORT.
>  
> I've found this issue while trying to make hive-on-spark running inside 
> docker.
>  
> HIVE_SPARK_RSC_CONFIGS is called by HiveSparkClientFactory.initiateSparkConf 
> > SparkSessionManagerImpl.setup and the latter call 
> SparkClientFactory.initialize(conf) which initialize the rpc server. This 
> RPCServer is then used to create the sparkClient which use the rpc server 
> port as --remote-port arg. Since initiateSparkConf ignore 
> SPARK_RPC_SERVER_PORT, then it will always be a random port.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20519) Remove 30m min value for hive.spark.session.timeout

2018-09-12 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20519:

Status: Patch Available  (was: Open)

> Remove 30m min value for hive.spark.session.timeout
> ---
>
> Key: HIVE-20519
> URL: https://issues.apache.org/jira/browse/HIVE-20519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20519.1.patch
>
>
> In HIVE-14162 we added the config \{{hive.spark.session.timeout}} which 
> provided a way to time out Spark sessions that are active for a long period 
> of time. The config has a lower bound of 30m which we should remove. It 
> should be possible for users to configure this value so the HoS session is 
> closed as soon as the query is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20519) Remove 30m min value for hive.spark.session.timeout

2018-09-12 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20519:

Attachment: HIVE-20519.1.patch

> Remove 30m min value for hive.spark.session.timeout
> ---
>
> Key: HIVE-20519
> URL: https://issues.apache.org/jira/browse/HIVE-20519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20519.1.patch
>
>
> In HIVE-14162 we added the config \{{hive.spark.session.timeout}} which 
> provided a way to time out Spark sessions that are active for a long period 
> of time. The config has a lower bound of 30m which we should remove. It 
> should be possible for users to configure this value so the HoS session is 
> closed as soon as the query is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19814) RPC Server port is always random for spark

2018-09-12 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612515#comment-16612515
 ] 

Sahil Takiar commented on HIVE-19814:
-

I think this patch is good to merge. I have a fix the flakiness of 
{{TestSparkSessionTimeout}} that I'm planning to merge in HIVE-20519, plus you 
got a green run earlier and the code (besides the new unit test) hasn't changed 
since then.

> RPC Server port is always random for spark
> --
>
> Key: HIVE-19814
> URL: https://issues.apache.org/jira/browse/HIVE-19814
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.0, 3.0.0, 2.4.0, 4.0.0
>Reporter: bounkong khamphousone
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-19814.1.patch, HIVE-19814.2.patch, 
> HIVE-19814.3.patch
>
>
> RPC server port is always a random one. In fact, the problem is in 
> RpcConfiguration.HIVE_SPARK_RSC_CONFIGS which doesn't include 
> SPARK_RPC_SERVER_PORT.
>  
> I've found this issue while trying to make hive-on-spark running inside 
> docker.
>  
> HIVE_SPARK_RSC_CONFIGS is called by HiveSparkClientFactory.initiateSparkConf 
> > SparkSessionManagerImpl.setup and the latter call 
> SparkClientFactory.initialize(conf) which initialize the rpc server. This 
> RPCServer is then used to create the sparkClient which use the rpc server 
> port as --remote-port arg. Since initiateSparkConf ignore 
> SPARK_RPC_SERVER_PORT, then it will always be a random port.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-11 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610774#comment-16610774
 ] 

Sahil Takiar commented on HIVE-17684:
-

Sounds good to me.

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19814) RPC Server port is always random for spark

2018-09-11 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610771#comment-16610771
 ] 

Sahil Takiar commented on HIVE-19814:
-

+1 pending Hive QA

> RPC Server port is always random for spark
> --
>
> Key: HIVE-19814
> URL: https://issues.apache.org/jira/browse/HIVE-19814
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.0, 3.0.0, 2.4.0, 4.0.0
>Reporter: bounkong khamphousone
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-19814.1.patch, HIVE-19814.2.patch, 
> HIVE-19814.3.patch
>
>
> RPC server port is always a random one. In fact, the problem is in 
> RpcConfiguration.HIVE_SPARK_RSC_CONFIGS which doesn't include 
> SPARK_RPC_SERVER_PORT.
>  
> I've found this issue while trying to make hive-on-spark running inside 
> docker.
>  
> HIVE_SPARK_RSC_CONFIGS is called by HiveSparkClientFactory.initiateSparkConf 
> > SparkSessionManagerImpl.setup and the latter call 
> SparkClientFactory.initialize(conf) which initialize the rpc server. This 
> RPCServer is then used to create the sparkClient which use the rpc server 
> port as --remote-port arg. Since initiateSparkConf ignore 
> SPARK_RPC_SERVER_PORT, then it will always be a random port.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20506) HOS times out when cluster is full while Hive-on-MR waits

2018-09-10 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609504#comment-16609504
 ] 

Sahil Takiar commented on HIVE-20506:
-

I think you can ignore it. PTest is able to apply it successfully so it 
shouldn't be an issue.

> HOS times out when cluster is full while Hive-on-MR waits
> -
>
> Key: HIVE-20506
> URL: https://issues.apache.org/jira/browse/HIVE-20506
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Major
> Attachments: HIVE-20506-CDH5.14.2.patch, HIVE-20506.1.patch, 
> HIVE-20506.2.patch, HIVE-20506.3.patch, Screen Shot 2018-09-07 at 8.10.37 
> AM.png
>
>
> My understanding is as follows:
> Hive-on-MR when the cluster is full will wait for resources to be available 
> before submitting a job. This is because the hadoop jar command is the 
> primary mechanism Hive uses to know if a job is complete or failed.
>  
> Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because 
> the RPC client in the AppMaster doesn't connect back to the RPC Server in 
> HS2. 
> This is a behavior difference it'd be great to close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20440) Create better cache eviction policy for SmallTableCache

2018-09-10 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609496#comment-16609496
 ] 

Sahil Takiar commented on HIVE-20440:
-

Then it might be better to have a two level cache. The first that has 
time-based eviction and has an eviction handler that loads into another cache 
that uses soft references.

> Create better cache eviction policy for SmallTableCache
> ---
>
> Key: HIVE-20440
> URL: https://issues.apache.org/jira/browse/HIVE-20440
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-20440.01.patch, HIVE-20440.02.patch, 
> HIVE-20440.03.patch, HIVE-20440.04.patch
>
>
> Enhance the SmallTableCache, to use guava cache with soft references, so that 
> we evict when there is memory pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-07 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607622#comment-16607622
 ] 

Sahil Takiar commented on HIVE-17684:
-

[~mi...@cloudera.com] yes we can configure it that way. We just have to add a 
new config to {{HiveConf.java}} and then we can set a lower value for all of 
our tests (we can do this by modifying the {{hive-site.xml}} files under 
{{data/conf}}). The default value can be somewhere around 50.

Let me know if you need help adding the new config variable.

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20519) Remove 30m min value for hive.spark.session.timeout

2018-09-07 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-20519:
---


> Remove 30m min value for hive.spark.session.timeout
> ---
>
> Key: HIVE-20519
> URL: https://issues.apache.org/jira/browse/HIVE-20519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> In HIVE-14162 we added the config \{{hive.spark.session.timeout}} which 
> provided a way to time out Spark sessions that are active for a long period 
> of time. The config has a lower bound of 30m which we should remove. It 
> should be possible for users to configure this value so the HoS session is 
> closed as soon as the query is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20506) HOS times out when cluster is full while Hive-on-MR waits

2018-09-07 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607614#comment-16607614
 ] 

Sahil Takiar commented on HIVE-20506:
-

Yeah, I'll file a follow up JIRA to do the re-factoring. Otherwise +1 pending 
tests.

> HOS times out when cluster is full while Hive-on-MR waits
> -
>
> Key: HIVE-20506
> URL: https://issues.apache.org/jira/browse/HIVE-20506
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Major
> Attachments: HIVE-20506-CDH5.14.2.patch, HIVE-20506.1.patch, Screen 
> Shot 2018-09-07 at 8.10.37 AM.png
>
>
> My understanding is as follows:
> Hive-on-MR when the cluster is full will wait for resources to be available 
> before submitting a job. This is because the hadoop jar command is the 
> primary mechanism Hive uses to know if a job is complete or failed.
>  
> Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because 
> the RPC client in the AppMaster doesn't connect back to the RPC Server in 
> HS2. 
> This is a behavior difference it'd be great to close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20506) HOS times out when cluster is full while Hive-on-MR waits

2018-09-07 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607431#comment-16607431
 ] 

Sahil Takiar commented on HIVE-20506:
-

The general idea makes sense to me. To confirm my understanding this change 
will essentially do the following:
* Parse the {{spark-submit}} logs and look for the YARN application id
* Create a {{YarnClient}} and check the state of the YARN app
* If the app is in {{ACCEPTED}} state (which means it has been acknowledged by 
YARN, but hasn't actually been started yet)
* As long as the app is in {{ACCEPTED}} state, extend the timeout until it 
transitions out of this state

Is that correct?

If thats the case, then I just have a few comments:
* Rather than extending the timeout, why not just create two separate ones? One 
timeout for launching {{bin/spark-submit}} --> app = ACCEPTED and another from 
app = RUNNING --> connection established.
** We probably don't want to change the meaning of the current timeout for 
backwards compatibility, so maybe we could deprecate the existing one and 
replace it with two new ones?
* Is there any way to avoid creating a {{YarnClient}}? I guess this is 
mitigated slightly by the fact that you only create the client if the timeout 
is triggered
** Just concerned about the overhead of creating a {{YarnClient}} + would this 
work on a secure cluster?
** {{bin/spark-submit}} should print out something like {{Application report 
for ... (state: ACCEPTED)}} perhaps we can parse the state from the logs?
* Can we move all the changes in {{RpcServer}} to a separate class? That class 
is really meant to act as a generic RPC framework that is relatively 
independent of the HoS logic

> HOS times out when cluster is full while Hive-on-MR waits
> -
>
> Key: HIVE-20506
> URL: https://issues.apache.org/jira/browse/HIVE-20506
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Major
> Attachments: HIVE-20506-CDH5.14.2.patch, HIVE-20506.1.patch, Screen 
> Shot 2018-09-07 at 8.10.37 AM.png
>
>
> My understanding is as follows:
> Hive-on-MR when the cluster is full will wait for resources to be available 
> before submitting a job. This is because the hadoop jar command is the 
> primary mechanism Hive uses to know if a job is complete or failed.
>  
> Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because 
> the RPC client in the AppMaster doesn't connect back to the RPC Server in 
> HS2. 
> This is a behavior difference it'd be great to close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-06 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606614#comment-16606614
 ] 

Sahil Takiar commented on HIVE-17684:
-

[~mi...@cloudera.com] took a look through some of the test fails and I'm seeing 
exceptions like this:

{code} mapjoin.MapJoinMemoryExhaustionError: GC time percentage = 111, exceeded 
threshold. {code}

Should that be possible? I would read that as 111% of the time is spent in GC, 
right? Could there be a bug somewhere causing this?

For the sake of this JIRA, maybe if HIVE_IN_TEST is set to true, we should just 
skip the check all together (e.g. never throw the exception).

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-06 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606358#comment-16606358
 ] 

Sahil Takiar commented on HIVE-17684:
-

[~mi...@cloudera.com] will do. The logs got cleanup, so re-attaching the patch.

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-06 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17684:

Attachment: HIVE-17684.06.patch

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch, 
> HIVE-17684.06.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20440) Create better cache eviction policy for SmallTableCache

2018-09-06 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606353#comment-16606353
 ] 

Sahil Takiar commented on HIVE-20440:
-

Few high level comments:

(1) The Guava cache only performs cleanup on writes, and occasionally during 
reads. Since we don't expect this to have a large # of reads or writes, we need 
to schedule a thread to call {{Cache#cleanUp}} periodically see 
https://github.com/google/guava/wiki/CachesExplained#when-does-cleanup-happen 
for details.

(2) It would be ideal if we could use {{expireAfterAccess}} as well as 
{{softValues}} (not sure if the Guava cache allows this). We can set an 
expiration of say 30 seconds. This is beneficial in the case where Spark tasks 
perform in lock step, and all start and end at the same time. If this happens, 
then {{softValues}} might evict the hash table as soon as each batch of Spark 
tasks has completed. Adding a 30 seconds delay to eviction should allow enough 
time for the next batch of Spark tasks to be scheduled.

> Create better cache eviction policy for SmallTableCache
> ---
>
> Key: HIVE-20440
> URL: https://issues.apache.org/jira/browse/HIVE-20440
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-20440.01.patch, HIVE-20440.02.patch, 
> HIVE-20440.03.patch, HIVE-20440.04.patch
>
>
> Enhance the SmallTableCache, to use guava cache with soft references, so that 
> we evict when there is memory pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20506) HOS times out when cluster is full while Hive-on-MR waits

2018-09-05 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604809#comment-16604809
 ] 

Sahil Takiar commented on HIVE-20506:
-

[~brocknoland] I think you might be hitting the 
{{SPARK_RPC_CLIENT_HANDSHAKE_TIMEOUT}} timeout. The 
{{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} is the timeout for how long SASL 
negotiation takes between the {{RemoteDriver}} and HS2 (yes I know its a bit 
confusing).

{{SPARK_RPC_CLIENT_HANDSHAKE_TIMEOUT}} is set to 90 seconds by default. So HoS 
will essentially wait 90 seconds for the Spark application to be submitted. The 
app has to be submit and accepted by YARN, and the {{RemoteDriver}} has to 
startup and connect back to HS2 all within 90 seconds. Essentially, if the 
cluster is busy, HoS will wait 90 seconds for the cluster to free up enough 
resources for the Spark app the start before issuing a timeout.

Is my understanding of your problem correct?

I agree we should make the HoS behavior as close to the HoMR behavior as 
possible. I'm not entirely sure what HoMR does. Is there a timeout for the 
MapReduce application to be accepted?

> HOS times out when cluster is full while Hive-on-MR waits
> -
>
> Key: HIVE-20506
> URL: https://issues.apache.org/jira/browse/HIVE-20506
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Priority: Major
>
> My understanding is as follows:
> Hive-on-MR when the cluster is full will wait for resources to be available 
> before submitting a job. This is because the hadoop jar command is the 
> primary mechanism Hive uses to know if a job is complete or failed.
>  
> Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because 
> the RPC client in the AppMaster doesn't connect back to the RPC Server in 
> HS2. 
> This is a behavior difference it'd be great to close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-04 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603530#comment-16603530
 ] 

Sahil Takiar commented on HIVE-17684:
-

[~mi...@cloudera.com] sorry for the delay on this. I figured out why a bunch of 
the {{TestSparkCliDriver}} tests were failing and attached an updated patch 
with a fix.

As for the issues with {{auto_join25.q.out}} - it looks like there is a config 
called {{hive.mapjoin.localtask.max.memory.usage}} / 
{{hive.mapjoin.followby.gby.localtask.max.memory.usage}} which defines how much 
memory the small table can consume before the memory exhaustion handler throws 
an error. These tests define a very low value for these configs and thus expect 
the tests to trigger the memory exhaustion handler.

We should probably do something similar. Introduce a new config that makes 
{{CRITICAl_GC_TIME_PERCENTAGE_PROD}} configurable. We can set it to a lower 
value in our tests in order to confirm that everything is working correctly.

Let me know if you need more help getting this done.

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2018-09-04 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17684:

Attachment: HIVE-17684.05.patch

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, 
> HIVE-17684.03.patch, HIVE-17684.04.patch, HIVE-17684.05.patch
>
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-31 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599265#comment-16599265
 ] 

Sahil Takiar commented on HIVE-14162:
-

[~Tagar] it was pushed to master, which is Hive 4.0.0. I can work on 
backporting it to at least branch-3 and branch-2. I will try branch-1 but I'm 
not sure it will work.

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch, HIVE-14162.8.patch, HIVE-14162.9.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-31 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14162:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Adam for the review!

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch, HIVE-14162.8.patch, HIVE-14162.9.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-31 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599260#comment-16599260
 ] 

Sahil Takiar commented on HIVE-14162:
-

{{TestCliDriver.testCliDriver[test_teradatabinaryfile]}} was failing 
consistently and looks like it just got fixed in HIVE-20225. So going to merge 
this now.

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch, HIVE-14162.8.patch, HIVE-14162.9.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-31 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14162:

Attachment: HIVE-14162.9.patch

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch, HIVE-14162.8.patch, HIVE-14162.9.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-31 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598793#comment-16598793
 ] 

Sahil Takiar commented on HIVE-14162:
-

Thanks Adam. Test failures don't look related, but rebased the patch + fixed 
some checkstyle issues. Hopefully the next run of Hive QA is green.

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch, HIVE-14162.8.patch, HIVE-14162.9.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-30 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597637#comment-16597637
 ] 

Sahil Takiar commented on HIVE-14162:
-

[~szita] thanks for taking a look. Addressed your comments and added some more 
javadocs to make the code easier to understand. Updated the RB.

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch, HIVE-14162.8.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-30 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14162:

Attachment: HIVE-14162.8.patch

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch, HIVE-14162.8.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20479) Update content/people.mdtext in cms

2018-08-28 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595364#comment-16595364
 ] 

Sahil Takiar commented on HIVE-20479:
-

+1

> Update content/people.mdtext in cms 
> 
>
> Key: HIVE-20479
> URL: https://issues.apache.org/jira/browse/HIVE-20479
> Project: Hive
>  Issue Type: Task
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
>
> I added myself to the committers list. 
>  
> {code:java}
> asherman 
> Andrew Sherman 
>  href="http://cloudera.com/";>Cloudera 
>  
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19814) RPC Server port is always random for spark

2018-08-22 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589134#comment-16589134
 ] 

Sahil Takiar commented on HIVE-19814:
-

[~bharos92] the new unit test looks good. Just one minor comment, inside 
{{testServerPortAssignment}} it would be good to open a new {{SparkSession}} 
using the {{SparkSessionManager}}. This should help verify that the new 
{{SparkSession}} is successfully able to connect to a {{RpcServer}} with a 
custom port.

> RPC Server port is always random for spark
> --
>
> Key: HIVE-19814
> URL: https://issues.apache.org/jira/browse/HIVE-19814
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.0, 3.0.0, 2.4.0, 4.0.0
>Reporter: bounkong khamphousone
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-19814.1.patch, HIVE-19814.2.patch
>
>
> RPC server port is always a random one. In fact, the problem is in 
> RpcConfiguration.HIVE_SPARK_RSC_CONFIGS which doesn't include 
> SPARK_RPC_SERVER_PORT.
>  
> I've found this issue while trying to make hive-on-spark running inside 
> docker.
>  
> HIVE_SPARK_RSC_CONFIGS is called by HiveSparkClientFactory.initiateSparkConf 
> > SparkSessionManagerImpl.setup and the latter call 
> SparkClientFactory.initialize(conf) which initialize the rpc server. This 
> RPCServer is then used to create the sparkClient which use the rpc server 
> port as --remote-port arg. Since initiateSparkConf ignore 
> SPARK_RPC_SERVER_PORT, then it will always be a random port.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-22 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588781#comment-16588781
 ] 

Sahil Takiar commented on HIVE-14162:
-

[~szita], [~asinkovits] could you take a look?

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19008) Improve Spark session id logging

2018-08-06 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-19008:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks Aihua for the review!

> Improve Spark session id logging
> 
>
> Key: HIVE-19008
> URL: https://issues.apache.org/jira/browse/HIVE-19008
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-19008.1.patch, HIVE-19008.2.patch
>
>
> HoS users have two session ids, one id for the Hive session and another id 
> for the Spark session, both are UUIDs.
> I think some improvements could be made here:
> The Spark session id could just be a counter that is incremented for each new 
> Spark session within a Hive session. Each Spark session is still globally 
> identifiable by its associated Hive session id + its own counter. This may 
> make more sense since the Hive session - Spark session has a 1-to-many 
> relationship, as in a single Hive session can contain multiple Spark 
> sessions, and each Spark session must belong to a Hive session.
> Furthermore, we should include both the Hive session id and Spark session id 
> in the console logs + the Spark Web UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19937) Intern fields in MapWork on deserialization

2018-08-06 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-19937:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Thanks for taking a look Vihang. I addressed your comments and attached an 
updated patch. Since the change was just to add new comments, I don't think its 
necessary to re-run Hive QA, so I went ahead and pushed this to master.

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> HIVE-19937.6.patch, post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19937) Intern fields in MapWork on deserialization

2018-08-06 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-19937:

Attachment: HIVE-19937.6.patch

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> HIVE-19937.6.patch, post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-06 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569874#comment-16569874
 ] 

Sahil Takiar commented on HIVE-14162:
-

[~ngangam], [~aihuaxu] could you take a look at this patch. I created an RB 
which a detailed description of the code changes - 
https://reviews.apache.org/r/68223/

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-05 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14162:

Attachment: HIVE-14162.7.patch

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, 
> HIVE-14162.6.patch, HIVE-14162.7.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-03 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14162:

Attachment: HIVE-14162.6.patch

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, HIVE-14162.6.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-02 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14162:

Attachment: HIVE-14162.5.patch

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20273) Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo

2018-08-02 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20273:

Status: Patch Available  (was: Open)

> Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo
> 
>
> Key: HIVE-20273
> URL: https://issues.apache.org/jira/browse/HIVE-20273
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20273.1.patch
>
>
> HIVE-19053 and HIVE-19733 add handling of {{InterruptedException}} to 
> {{#getSparkJobInfo}} and {{#getSparkStagesInfo}} in {{RemoteSparkJobStatus}}, 
> but that means the {{InterruptedException}} is wrapped in a {{HiveException}} 
> and then thrown. The {{HiveException}} is then cause in 
> {{RemoteSparkJobMonitor}} and then wrapped in another Hive exception. The 
> double nesting of hive exception causes the logic in 
> {{SparkTask#setSparkException}} to break, and it doesn't kill the job if an 
> interrupted exception is thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20273) Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo

2018-08-02 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-20273:

Attachment: HIVE-20273.1.patch

> Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo
> 
>
> Key: HIVE-20273
> URL: https://issues.apache.org/jira/browse/HIVE-20273
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20273.1.patch
>
>
> HIVE-19053 and HIVE-19733 add handling of {{InterruptedException}} to 
> {{#getSparkJobInfo}} and {{#getSparkStagesInfo}} in {{RemoteSparkJobStatus}}, 
> but that means the {{InterruptedException}} is wrapped in a {{HiveException}} 
> and then thrown. The {{HiveException}} is then cause in 
> {{RemoteSparkJobMonitor}} and then wrapped in another Hive exception. The 
> double nesting of hive exception causes the logic in 
> {{SparkTask#setSparkException}} to break, and it doesn't kill the job if an 
> interrupted exception is thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19008) Improve Spark session id logging

2018-08-02 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567404#comment-16567404
 ] 

Sahil Takiar commented on HIVE-19008:
-

Ping [~aihuaxu]

> Improve Spark session id logging
> 
>
> Key: HIVE-19008
> URL: https://issues.apache.org/jira/browse/HIVE-19008
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19008.1.patch, HIVE-19008.2.patch
>
>
> HoS users have two session ids, one id for the Hive session and another id 
> for the Spark session, both are UUIDs.
> I think some improvements could be made here:
> The Spark session id could just be a counter that is incremented for each new 
> Spark session within a Hive session. Each Spark session is still globally 
> identifiable by its associated Hive session id + its own counter. This may 
> make more sense since the Hive session - Spark session has a 1-to-many 
> relationship, as in a single Hive session can contain multiple Spark 
> sessions, and each Spark session must belong to a Hive session.
> Furthermore, we should include both the Hive session id and Spark session id 
> in the console logs + the Spark Web UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18684) Race condition in RemoteSparkJobMonitor

2018-08-02 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18684:

Status: Open  (was: Patch Available)

> Race condition in RemoteSparkJobMonitor
> ---
>
> Key: HIVE-18684
> URL: https://issues.apache.org/jira/browse/HIVE-18684
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-18684.1.patch, HIVE-18684.2.patch, 
> HIVE-18684.3.patch
>
>
> There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in 
> {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it 
> doesn't. This can be easily verified by running a qtest on 
> {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query 
> Hive on Spark job}} is printed vs. the number of times {{Finished 
> successfully in}} gets printed.
> The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks 
> the state of {{JobHandle}}. Depending on the state, it prints out some 
> logging info. The content of the logs contain an implicit assumption that 
> logs in the {{STARTED}} state are printed before the logs in the 
> {{SUCCEEDED}} state. However, this isn't always the case. The state 
> transitions are driven by how long the remote Spark job takes to run, and it 
> it finishes within one second then the logs in the {{STARTED}} state never 
> printed.
> This can be confusing to users, and there is key debugging information that 
> is printed in the {{STARTED}} state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-08-02 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567402#comment-16567402
 ] 

Sahil Takiar commented on HIVE-19937:
-

Ping [~vihangk1]

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20280) JobResultSerializer uses wrong registration id in KyroMessageCodec

2018-07-31 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-20280:
---


> JobResultSerializer uses wrong registration id in KyroMessageCodec
> --
>
> Key: HIVE-20280
> URL: https://issues.apache.org/jira/browse/HIVE-20280
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Inside {{KryoMessageCodec}} the code:
> {code}
>   Kryo kryo = new Kryo();
>   int count = 0;
>   for (Class klass : messages) {
> kryo.register(klass, REG_ID_BASE + count);
> count++;
>   }
>   kryo.register(BaseProtocol.JobResult.class, new JobResultSerializer(), 
> count);
> {code}
> Uses the wrong registration id for the {{JobResultSerializer}} it should be 
> {{REG_ID_BASE + count}} not {{count}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2155 matches

Mail list logo