[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-18269: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master. Thanks for the reviews! > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-18269.01.patch, HIVE-18269.02.patch, > HIVE-18269.03.patch, HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot > 2017-12-13 at 1.15.16 AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-18269: Attachment: HIVE-18269.03.patch I cannot repro the new failures and they look like they are in unstable tests. Attaching the patch again just in case. > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-18269.01.patch, HIVE-18269.02.patch, > HIVE-18269.03.patch, HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot > 2017-12-13 at 1.15.16 AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-18269: Attachment: HIVE-18269.02.patch A stupid bug... > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-18269.01.patch, HIVE-18269.02.patch, > HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot 2017-12-13 at 1.15.16 > AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-18269: Status: Patch Available (was: Open) Done... I am trying to test it on cluster but the cluster I'm using is down > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-18269.01.patch, HIVE-18269.1.patch, > HIVE-18269.bad.patch, Screen Shot 2017-12-13 at 1.15.16 AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-18269: Attachment: HIVE-18269.01.patch The patch. I have also attached a patch that I've tested on some cluster that limits the queue without replacing the linked list. New patch needs some extensive cluster testing. [~prasanth_j] [~gopalv] can you take a look? > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-18269.01.patch, HIVE-18269.1.patch, > HIVE-18269.bad.patch, Screen Shot 2017-12-13 at 1.15.16 AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-18269: Attachment: HIVE-18269.bad.patch > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot > 2017-12-13 at 1.15.16 AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-18269: Status: Open (was: Patch Available) > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-18269.1.patch, Screen Shot 2017-12-13 at 1.15.16 > AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-18269: - Status: Patch Available (was: Open) > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-18269.1.patch, Screen Shot 2017-12-13 at 1.15.16 > AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-18269: - Attachment: HIVE-18269.1.patch Haven't tested the patch yet on the repro cluster. Cluster is busy right now. Will test it on free time. > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-18269.1.patch, Screen Shot 2017-12-13 at 1.15.16 > AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-18269: - Description: pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow indefinitely when Llap IO is faster than processing pipeline. Since we don't have backpressure to slow down the IO, this can lead to indefinite growth of pending data leading to severe GC pressure and eventually lead to OOM. This specific instance of LLAP was running on HDFS on top of EBS volume backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. was:pendingData linked list in Llap IO elevator (LlapRecordReader.java) may have grow indefinitely when Llap IO is faster than processing pipeline. Since we don't have backpressure to slow down the IO, this can lead to indefinite growth of pending data leading to severe GC pressure and eventually lead to OOM. > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: Screen Shot 2017-12-13 at 1.15.16 AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM
[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-18269: - Attachment: Screen Shot 2017-12-13 at 1.15.16 AM.png Attached images show the retained references. Entire 40GB heap was occupied by pendingData (some nodes are not expanded for brevity) > LLAP: Fast llap io with slow processing pipeline can lead to OOM > > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: Screen Shot 2017-12-13 at 1.15.16 AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may have > grow indefinitely when Llap IO is faster than processing pipeline. Since we > don't have backpressure to slow down the IO, this can lead to indefinite > growth of pending data leading to severe GC pressure and eventually lead to > OOM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)