[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel
[ https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269751#comment-15269751 ] Julian Hyde commented on HIVE-11165: I don't have an update. It's not obviously a thread-safety issue; the graph which is blowing up in that call stack is not shared between threads. More likely, the planner is firing rules over and over again until the graph of RelNodes gets really large. Thread-safety is one of several possible causes of that. > Calcite planner might have a thread-safety issue compiling in parallel > -- > > Key: HIVE-11165 > URL: https://issues.apache.org/jira/browse/HIVE-11165 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez > Attachments: RunJar-2015-06-30.snapshot > > > After about 6 minutes trying to plan a query, the HiveServer2 was killed to > restore functionality to a test run. > The HEP planner is stuck on a TopologicalOrder traversal and there were no > queries being fed into the HiveServer2 after it got stuck. > TPC-DS query13 was the query in question, at 4 way parallel, which triggered > the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel
[ https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269736#comment-15269736 ] Sergey Shelukhin commented on HIVE-11165: - Any update here? [~jcamachorodriguez] [~julianhyde] > Calcite planner might have a thread-safety issue compiling in parallel > -- > > Key: HIVE-11165 > URL: https://issues.apache.org/jira/browse/HIVE-11165 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez > Attachments: RunJar-2015-06-30.snapshot > > > After about 6 minutes trying to plan a query, the HiveServer2 was killed to > restore functionality to a test run. > The HEP planner is stuck on a TopologicalOrder traversal and there were no > queries being fed into the HiveServer2 after it got stuck. > TPC-DS query13 was the query in question, at 4 way parallel, which triggered > the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel
[ https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627028#comment-14627028 ] Laljo John Pullokkaran commented on HIVE-11165: --- I am going on vacation, [~jcamachorodriguez] Could you take a look? Calcite planner might have a thread-safety issue compiling in parallel -- Key: HIVE-11165 URL: https://issues.apache.org/jira/browse/HIVE-11165 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Jesus Camacho Rodriguez Attachments: RunJar-2015-06-30.snapshot After about 6 minutes trying to plan a query, the HiveServer2 was killed to restore functionality to a test run. The HEP planner is stuck on a TopologicalOrder traversal and there were no queries being fed into the HiveServer2 after it got stuck. TPC-DS query13 was the query in question, at 4 way parallel, which triggered the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel
[ https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621264#comment-14621264 ] Sergey Shelukhin commented on HIVE-11165: - [~jpullokkaran] [~pxiong] can you guys comment? Calcite planner might have a thread-safety issue compiling in parallel -- Key: HIVE-11165 URL: https://issues.apache.org/jira/browse/HIVE-11165 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Laljo John Pullokkaran Attachments: RunJar-2015-06-30.snapshot After about 6 minutes trying to plan a query, the HiveServer2 was killed to restore functionality to a test run. The HEP planner is stuck on a TopologicalOrder traversal and there were no queries being fed into the HiveServer2 after it got stuck. TPC-DS query13 was the query in question, at 4 way parallel, which triggered the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel
[ https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621262#comment-14621262 ] Sergey Shelukhin commented on HIVE-11165: - I have seen the following callstack that may be related, after ctrl-c-ing HiveServer2 that was stuck forever {noformat} Exception in thread HiveServer2-Handler-Pool: Thread-81 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.resize(HashMap.java:703) at java.util.HashMap.putVal(HashMap.java:662) at java.util.HashMap.put(HashMap.java:611) at java.util.HashSet.add(HashSet.java:219) at org.apache.calcite.util.graph.BreadthFirstIterator.reachable(BreadthFirstIterator.java:61) at org.apache.calcite.plan.hep.HepPlanner.collectGarbage(HepPlanner.java:900) at org.apache.calcite.plan.hep.HepPlanner.getGraphIterator(HepPlanner.java:427) at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:400) at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:285) at org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:72) at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:207) at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:194) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:1035) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:964) ... {noformat} Calcite planner might have a thread-safety issue compiling in parallel -- Key: HIVE-11165 URL: https://issues.apache.org/jira/browse/HIVE-11165 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.0.0 Reporter: Gopal V Attachments: RunJar-2015-06-30.snapshot After about 6 minutes trying to plan a query, the HiveServer2 was killed to restore functionality to a test run. The HEP planner is stuck on a TopologicalOrder traversal and there were no queries being fed into the HiveServer2 after it got stuck. TPC-DS query13 was the query in question, at 4 way parallel, which triggered the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel
[ https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621286#comment-14621286 ] Pengcheng Xiong commented on HIVE-11165: I attached query 13 here. I did not know the root cause yet but I saw lots of predicates. I suspect that this is related to the recent optimization on PPD? [~jpullokkaran]? {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and ss_sold_date between '2001-01-01' and '2001-12-31' and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} Calcite planner might have a thread-safety issue compiling in parallel -- Key: HIVE-11165 URL: https://issues.apache.org/jira/browse/HIVE-11165 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Laljo John Pullokkaran Attachments: RunJar-2015-06-30.snapshot After about 6 minutes trying to plan a query, the HiveServer2 was killed to restore functionality to a test run. The HEP planner is stuck on a TopologicalOrder traversal and there were no queries being fed into the HiveServer2 after it got stuck. TPC-DS query13 was the query in question, at 4 way parallel, which triggered the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel
[ https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621403#comment-14621403 ] Gopal V commented on HIVE-11165: [~pxiong]: this doesn't always happen - I have to increase concurrency to trigger this. The current test workaround is that there's a RandomOrderController in my tests so that it doesn't plan the same query (with the same vertex names, predicates etc) at the same time. Calcite planner might have a thread-safety issue compiling in parallel -- Key: HIVE-11165 URL: https://issues.apache.org/jira/browse/HIVE-11165 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Laljo John Pullokkaran Attachments: RunJar-2015-06-30.snapshot After about 6 minutes trying to plan a query, the HiveServer2 was killed to restore functionality to a test run. The HEP planner is stuck on a TopologicalOrder traversal and there were no queries being fed into the HiveServer2 after it got stuck. TPC-DS query13 was the query in question, at 4 way parallel, which triggered the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel
[ https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621431#comment-14621431 ] Laljo John Pullokkaran commented on HIVE-11165: --- [~gopalv] How did we get to the conclusion that Calcite is not thread safe? Calcite planner might have a thread-safety issue compiling in parallel -- Key: HIVE-11165 URL: https://issues.apache.org/jira/browse/HIVE-11165 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Laljo John Pullokkaran Attachments: RunJar-2015-06-30.snapshot After about 6 minutes trying to plan a query, the HiveServer2 was killed to restore functionality to a test run. The HEP planner is stuck on a TopologicalOrder traversal and there were no queries being fed into the HiveServer2 after it got stuck. TPC-DS query13 was the query in question, at 4 way parallel, which triggered the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel
[ https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621450#comment-14621450 ] Gopal V commented on HIVE-11165: [~jpullokkaran]: Take a look at the timeline between the threads in the attached snapshot (ThreadID 83, 84, 85) from the 12th min to the 15th minute. They're always switching into topological order and BFS garbage collection, until I killed it. Calcite planner might have a thread-safety issue compiling in parallel -- Key: HIVE-11165 URL: https://issues.apache.org/jira/browse/HIVE-11165 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Laljo John Pullokkaran Attachments: RunJar-2015-06-30.snapshot After about 6 minutes trying to plan a query, the HiveServer2 was killed to restore functionality to a test run. The HEP planner is stuck on a TopologicalOrder traversal and there were no queries being fed into the HiveServer2 after it got stuck. TPC-DS query13 was the query in question, at 4 way parallel, which triggered the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)