Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-18 Thread Stamatis Zampetakis
I looked a bit better on the plans of query2 and query59 and there is something weird with the semi joins that appear in the plan. A possible workaround, till we fix the problem, would be to disable semi joins for these two queries: set hive.tez.dynamic.semijoin.reduction=false; Best, Stamatis

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-18 Thread Stamatis Zampetakis
Hi Sungwoo, As far as it concerns query14 the problem is logged in HIVE-24167 [1]. There is also a PR [2] for reproducing the problem so it should be feasible to find the offending commit with git bisect. For queries 2 and 59, I am also able to reproduce the behavior that you mentioned (same

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-17 Thread Sungwoo Park
> > > 1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail. > Please see the attachment for stack traces. > > Even thru the exception seem to be a reoccurance of the previous issue - > existing checks + HIVE-24360 should have restricted all incorrect cases. > I built in some debug

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-16 Thread Zoltan Haindrich
Hey Sungwoo! On 11/13/20 6:47 PM, Sungwoo Park wrote: I have run another fresh TPC-DS test using the latest commit. Here is the summary: Thank you very much! > 1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail. Please see the attachment for stack traces. Even thru the

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-14 Thread Peter Varga
Hi Sungwoo, The query 14 result changed because of HIVE-24233 But this SemanticException is thrown to avoid the Nullpointer in SemanticAnalyzer, the root cause is still the cbo failure before, I guess with the same exception as was in your previous test run. Peter On Fri, Nov 13, 2020 at 6:56

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-13 Thread Sungwoo Park
Hi Zoltan, I have run another fresh TPC-DS test using the latest commit. Here is the summary: Commits used: 1) Hive, master, e9f72e654750de208227d46a22e983413b080c6c (HIVE-24366, Thu Nov 12) 2) Tez, 0.10.0, 22fec6c0ecc7ebe6f6f28800935cc6f69794dad5 (CHANGES.txt updated with TEZ-4238, Thu Oct 8)

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-12 Thread Zoltan Haindrich
Hey Sungwoo! On 11/12/20 10:23 AM, Sungwoo Park wrote: Hi Zoltan, I used the same hive-site.xml for the previous test (which was okay) and the new test (which failed), so my guess is that it is perhaps due to a commit since the previous test. Let me try later to identify the commit that fails

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-12 Thread Sungwoo Park
Hi Zoltan, I used the same hive-site.xml for the previous test (which was okay) and the new test (which failed), so my guess is that it is perhaps due to a commit since the previous test. Let me try later to identify the commit that fails query 14, with the hope that identifying such a commit

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-09 Thread Zoltan Haindrich
Hey Sungwoo! Regarding Q14 / "java.lang.RuntimeException: equivalence mapping violation" From the stack trace you shared it seems like the mapper have already seen both the filter and the ast node earlier - and they are in separate mapping groups. (Which is unfortunate) I think it won't be

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-05 Thread Sungwoo Park
Hi Stamatis, Mustafa, Zoltán, This is the result of a new experiment. These are the changes that I made: 1. Reverted HIVE-24139. (It turns out that HIVE-24139 does not affect the result of the TPC-DS benchmark.) 2. Set hive.optimize.shared.work.dppunion to false in hive-site.xml. 3. Set

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-04 Thread Zoltán Haindrich
Hey All! I think this might be caused by a recent feature addition which became over zealous in some situations, results in an incorrect plan in some cases. I've already fixed the issue and it will go in as part of one of the followups. You could disable the feature by changing

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-04 Thread Mustafa IMAN
Hi Sungwoo, There is https://issues.apache.org/jira/browse/HIVE-23975 causing a regression in runtime. There is a ticket open to fix it ( https://issues.apache.org/jira/browse/HIVE-24139) which is still in progress. You might want to revert 23975 before trying. On Wed, Nov 4, 2020 at 2:55 PM

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-04 Thread Stamatis Zampetakis
Hi Sungwoo, Personally, I would be also interested to see the results of these experiments if they are available somewhere. I didn't understand if the queries are failing at runtime or compile time. Are the above errors the only ones that you're getting? If you can reproduce the problem with a

Result of the TPC-DS benchmark on Hive master branch

2020-11-04 Thread Sungwoo Park
Hello, I have tested a recent commit of the master branch using the TPC-DS benchmark. I used Hive on Tez (not Hive-LLAP). The way I tested is: 1) create a database consisting of external tables from a 100GB TPC-DS text dataset 2) create a database consisting of ORC tables from the previous