[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-09 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18269:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the reviews!

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-18269.01.patch, HIVE-18269.02.patch, 
> HIVE-18269.03.patch, HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot 
> 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18269:

Attachment: HIVE-18269.03.patch

I cannot repro the new failures and they look like they are in unstable tests. 
Attaching the patch again just in case.

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.02.patch, 
> HIVE-18269.03.patch, HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot 
> 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18269:

Attachment: HIVE-18269.02.patch

A stupid bug...

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.02.patch, 
> HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot 2017-12-13 at 1.15.16 
> AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18269:

Status: Patch Available  (was: Open)

Done... I am trying to test it on cluster but the cluster I'm using is down 

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.1.patch, 
> HIVE-18269.bad.patch, Screen Shot 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18269:

Attachment: HIVE-18269.01.patch

The patch. I have also attached a patch that I've tested on some cluster that 
limits the queue without replacing the linked list.

New patch needs some extensive cluster testing.
[~prasanth_j] [~gopalv] can you take a look?

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.1.patch, 
> HIVE-18269.bad.patch, Screen Shot 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18269:

Attachment: HIVE-18269.bad.patch

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot 
> 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18269:

Status: Open  (was: Patch Available)

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.1.patch, Screen Shot 2017-12-13 at 1.15.16 
> AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2017-12-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18269:
-
Status: Patch Available  (was: Open)

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18269.1.patch, Screen Shot 2017-12-13 at 1.15.16 
> AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2017-12-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18269:
-
Attachment: HIVE-18269.1.patch

Haven't tested the patch yet on the repro cluster. Cluster is busy right now. 
Will test it on free time. 

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18269.1.patch, Screen Shot 2017-12-13 at 1.15.16 
> AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2017-12-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18269:
-
Description: 
pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
indefinitely when Llap IO is faster than processing pipeline. Since we don't 
have backpressure to slow down the IO, this can lead to indefinite growth of 
pending data leading to severe GC pressure and eventually lead to OOM.

This specific instance of LLAP was running on HDFS on top of EBS volume backed 
by SSD. The query that triggered this is issue was ANALYZE STATISTICS .. FOR 
COLUMNS which also gather bitvectors. Fast IO and Slow processing case.

  was:pendingData linked list in Llap IO elevator (LlapRecordReader.java) may 
have grow indefinitely when Llap IO is faster than processing pipeline. Since 
we don't have backpressure to slow down the IO, this can lead to indefinite 
growth of pending data leading to severe GC pressure and eventually lead to OOM.


> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: Screen Shot 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2017-12-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18269:
-
Attachment: Screen Shot 2017-12-13 at 1.15.16 AM.png

Attached images show the retained references. Entire 40GB heap was occupied by 
pendingData (some nodes are not expanded for brevity)

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: Screen Shot 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may have 
> grow indefinitely when Llap IO is faster than processing pipeline. Since we 
> don't have backpressure to slow down the IO, this can lead to indefinite 
> growth of pending data leading to severe GC pressure and eventually lead to 
> OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)