[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel

2016-05-03 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269751#comment-15269751
 ] 

Julian Hyde commented on HIVE-11165:


I don't have an update. It's not obviously a thread-safety issue; the graph 
which is blowing up in that call stack is not shared between threads. More 
likely, the planner is firing rules over and over again until the graph of 
RelNodes gets really large. Thread-safety is one of several possible causes of 
that.

> Calcite planner might have a thread-safety issue compiling in parallel
> --
>
> Key: HIVE-11165
> URL: https://issues.apache.org/jira/browse/HIVE-11165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: RunJar-2015-06-30.snapshot
>
>
> After about 6 minutes trying to plan a query, the HiveServer2 was killed to 
> restore functionality to a test run.
> The HEP planner is stuck on a TopologicalOrder traversal and there were no 
> queries being fed into the HiveServer2 after it got stuck.
> TPC-DS query13 was the query in question, at 4 way parallel, which triggered 
> the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel

2016-05-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269736#comment-15269736
 ] 

Sergey Shelukhin commented on HIVE-11165:
-

Any update here? [~jcamachorodriguez] [~julianhyde]

> Calcite planner might have a thread-safety issue compiling in parallel
> --
>
> Key: HIVE-11165
> URL: https://issues.apache.org/jira/browse/HIVE-11165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: RunJar-2015-06-30.snapshot
>
>
> After about 6 minutes trying to plan a query, the HiveServer2 was killed to 
> restore functionality to a test run.
> The HEP planner is stuck on a TopologicalOrder traversal and there were no 
> queries being fed into the HiveServer2 after it got stuck.
> TPC-DS query13 was the query in question, at 4 way parallel, which triggered 
> the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel

2015-07-14 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627028#comment-14627028
 ] 

Laljo John Pullokkaran commented on HIVE-11165:
---

I am going on vacation, [~jcamachorodriguez] Could you take a look?

 Calcite planner might have a thread-safety issue compiling in parallel
 --

 Key: HIVE-11165
 URL: https://issues.apache.org/jira/browse/HIVE-11165
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Jesus Camacho Rodriguez
 Attachments: RunJar-2015-06-30.snapshot


 After about 6 minutes trying to plan a query, the HiveServer2 was killed to 
 restore functionality to a test run.
 The HEP planner is stuck on a TopologicalOrder traversal and there were no 
 queries being fed into the HiveServer2 after it got stuck.
 TPC-DS query13 was the query in question, at 4 way parallel, which triggered 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel

2015-07-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621264#comment-14621264
 ] 

Sergey Shelukhin commented on HIVE-11165:
-

[~jpullokkaran] [~pxiong] can you guys comment?


 Calcite planner might have a thread-safety issue compiling in parallel
 --

 Key: HIVE-11165
 URL: https://issues.apache.org/jira/browse/HIVE-11165
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Laljo John Pullokkaran
 Attachments: RunJar-2015-06-30.snapshot


 After about 6 minutes trying to plan a query, the HiveServer2 was killed to 
 restore functionality to a test run.
 The HEP planner is stuck on a TopologicalOrder traversal and there were no 
 queries being fed into the HiveServer2 after it got stuck.
 TPC-DS query13 was the query in question, at 4 way parallel, which triggered 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel

2015-07-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621262#comment-14621262
 ] 

Sergey Shelukhin commented on HIVE-11165:
-

I have seen the following callstack that may be related, after ctrl-c-ing 
HiveServer2 that was stuck forever
{noformat}
Exception in thread HiveServer2-Handler-Pool: Thread-81 
java.lang.OutOfMemoryError: GC overhead limit exceeded
   at java.util.HashMap.resize(HashMap.java:703)
   at java.util.HashMap.putVal(HashMap.java:662)
   at java.util.HashMap.put(HashMap.java:611)
   at java.util.HashSet.add(HashSet.java:219)
   at 
org.apache.calcite.util.graph.BreadthFirstIterator.reachable(BreadthFirstIterator.java:61)
   at org.apache.calcite.plan.hep.HepPlanner.collectGarbage(HepPlanner.java:900)
   at 
org.apache.calcite.plan.hep.HepPlanner.getGraphIterator(HepPlanner.java:427)
   at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:400)
   at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:285)
   at 
org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:72)
   at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:207)
   at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:194)
   at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:1035)
   at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:964)
...
{noformat}

 Calcite planner might have a thread-safety issue compiling in parallel
 --

 Key: HIVE-11165
 URL: https://issues.apache.org/jira/browse/HIVE-11165
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.0.0
Reporter: Gopal V
 Attachments: RunJar-2015-06-30.snapshot


 After about 6 minutes trying to plan a query, the HiveServer2 was killed to 
 restore functionality to a test run.
 The HEP planner is stuck on a TopologicalOrder traversal and there were no 
 queries being fed into the HiveServer2 after it got stuck.
 TPC-DS query13 was the query in question, at 4 way parallel, which triggered 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel

2015-07-09 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621286#comment-14621286
 ] 

Pengcheng Xiong commented on HIVE-11165:


I attached query 13 here. I did not know the root cause yet but I saw lots of 
predicates. I suspect that this is related to the recent optimization on PPD? 
[~jpullokkaran]?
{code}
select avg(ss_quantity)
   ,avg(ss_ext_sales_price)
   ,avg(ss_ext_wholesale_cost)
   ,sum(ss_ext_wholesale_cost)
 from store_sales
 ,store
 ,customer_demographics
 ,household_demographics
 ,customer_address
 ,date_dim
 where store.s_store_sk = store_sales.ss_store_sk
 and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
2001
 and ss_sold_date between '2001-01-01' and '2001-12-31'
 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'M'
  and customer_demographics.cd_education_status = '4 yr Degree'
  and store_sales.ss_sales_price between 100.00 and 150.00
  and household_demographics.hd_dep_count = 3   
 )or
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'D'
  and customer_demographics.cd_education_status = 'Primary'
  and store_sales.ss_sales_price between 50.00 and 100.00   
  and household_demographics.hd_dep_count = 1
 ) or 
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'U'
  and customer_demographics.cd_education_status = 'Advanced Degree'
  and store_sales.ss_sales_price between 150.00 and 200.00 
  and household_demographics.hd_dep_count = 1  
 ))
 and((store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('KY', 'GA', 'NM')
  and store_sales.ss_net_profit between 100 and 200  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('MT', 'OR', 'IN')
  and store_sales.ss_net_profit between 150 and 300  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('WI', 'MO', 'WV')
  and store_sales.ss_net_profit between 50 and 250  
 ))
;
{code}

 Calcite planner might have a thread-safety issue compiling in parallel
 --

 Key: HIVE-11165
 URL: https://issues.apache.org/jira/browse/HIVE-11165
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Laljo John Pullokkaran
 Attachments: RunJar-2015-06-30.snapshot


 After about 6 minutes trying to plan a query, the HiveServer2 was killed to 
 restore functionality to a test run.
 The HEP planner is stuck on a TopologicalOrder traversal and there were no 
 queries being fed into the HiveServer2 after it got stuck.
 TPC-DS query13 was the query in question, at 4 way parallel, which triggered 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel

2015-07-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621403#comment-14621403
 ] 

Gopal V commented on HIVE-11165:


[~pxiong]: this doesn't always happen - I have to increase concurrency to 
trigger this.

The current test workaround is that there's a RandomOrderController in my tests 
so that it doesn't plan the same query (with the same vertex names, predicates 
etc) at the same time.

 Calcite planner might have a thread-safety issue compiling in parallel
 --

 Key: HIVE-11165
 URL: https://issues.apache.org/jira/browse/HIVE-11165
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Laljo John Pullokkaran
 Attachments: RunJar-2015-06-30.snapshot


 After about 6 minutes trying to plan a query, the HiveServer2 was killed to 
 restore functionality to a test run.
 The HEP planner is stuck on a TopologicalOrder traversal and there were no 
 queries being fed into the HiveServer2 after it got stuck.
 TPC-DS query13 was the query in question, at 4 way parallel, which triggered 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel

2015-07-09 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621431#comment-14621431
 ] 

Laljo John Pullokkaran commented on HIVE-11165:
---

[~gopalv] How did we get to the conclusion that Calcite is not thread safe?

 Calcite planner might have a thread-safety issue compiling in parallel
 --

 Key: HIVE-11165
 URL: https://issues.apache.org/jira/browse/HIVE-11165
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Laljo John Pullokkaran
 Attachments: RunJar-2015-06-30.snapshot


 After about 6 minutes trying to plan a query, the HiveServer2 was killed to 
 restore functionality to a test run.
 The HEP planner is stuck on a TopologicalOrder traversal and there were no 
 queries being fed into the HiveServer2 after it got stuck.
 TPC-DS query13 was the query in question, at 4 way parallel, which triggered 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11165) Calcite planner might have a thread-safety issue compiling in parallel

2015-07-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621450#comment-14621450
 ] 

Gopal V commented on HIVE-11165:


[~jpullokkaran]: Take a look at the timeline between the threads in the 
attached snapshot (ThreadID 83, 84, 85) from the 12th min to the 15th minute.

They're always switching into topological order and BFS garbage collection, 
until I killed it.

 Calcite planner might have a thread-safety issue compiling in parallel
 --

 Key: HIVE-11165
 URL: https://issues.apache.org/jira/browse/HIVE-11165
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Laljo John Pullokkaran
 Attachments: RunJar-2015-06-30.snapshot


 After about 6 minutes trying to plan a query, the HiveServer2 was killed to 
 restore functionality to a test run.
 The HEP planner is stuck on a TopologicalOrder traversal and there were no 
 queries being fed into the HiveServer2 after it got stuck.
 TPC-DS query13 was the query in question, at 4 way parallel, which triggered 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)