Re: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

2017-11-15 Thread Gopal Vijayaraghavan


> My guess is that the complex expression used in 
> VectorizedLogicBench.IfExprLongColumnLongColumnBench actually uses more CPU 
> than other expression.

if you have a -XX:+PrintAssembly dump (or run jmh with -prof perfasm), then we 
could see if the JDK is autovectorizing that loop or not.

That's an If condition evaluation loop without a branch, which was specifically 
written for the JIT to speed it up.

Cheers,
Gopal








Re: Review Request 63711: HIVE-17528 Add more q-tests for Hive-on-Spark with Parquet vectorized reader

2017-11-15 Thread cheng xu


> On Nov. 16, 2017, 4:44 a.m., Vihang Karajgaonkar wrote:
> > ql/src/test/results/clientpositive/llap/sysdb.q.out
> > Lines 2236 (patched)
> > 
> >
> > not sure why this file is changing. Do you know?

Actually this test case is failing in pre-commit due to difference in output. 
Locally I am not able to reproduce the output as the golden files. So I prefer 
to upload the local version instead and file other jira about this failing test 
case. Any thoughts on that?


> On Nov. 16, 2017, 4:44 a.m., Vihang Karajgaonkar wrote:
> > ql/src/test/results/clientpositive/llap/sysdb.q.out
> > Line 3646 (original), 3706 (patched)
> > 
> >
> > how come data values are changing here?

So the same as above.


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63711/#review191092
---


On Nov. 15, 2017, 9:34 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63711/
> ---
> 
> (Updated Nov. 15, 2017, 9:34 a.m.)
> 
> 
> Review request for hive and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Most of the vectorization related q-tests operate on ORC tables using Tez. It 
> would be good to add more coverage on a different combination of engine and 
> file-format. We can model existing q-tests using parquet tables and run it 
> using TestSparkCliDriver
> 
> 
> Diffs
> -
> 
>   data/scripts/q_test_cleanup.sql 4620dcd 
>   data/scripts/q_test_init.sql f763c12 
>   itests/src/test/resources/testconfiguration.properties 1d16b65 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java f1d90ff 
>   pom.xml dfb29ce 
>   ql/src/test/queries/clientpositive/parquet_read_backward_compatible_files.q 
> 0abbc2f 
>   ql/src/test/queries/clientpositive/parquet_vectorization_0.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_10.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_11.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_12.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_13.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_14.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_15.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_16.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_17.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_5.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_6.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_7.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_8.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_9.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_decimal_date.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_div0.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_limit.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_nested_udf.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_not.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_offset_limit.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_part.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_part_project.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_part_varchar.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_pushdown.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/add_part_exist.q.out f8d50ca 
>   ql/src/test/results/clientpositive/alter1.q.out c2efbe5 
>   ql/src/test/results/clientpositive/alter2.q.out 18032ac 
>   ql/src/test/results/clientpositive/alter3.q.out 3bd7288 
>   ql/src/test/results/clientpositive/alter4.q.out ddcb0ed 
>   ql/src/test/results/clientpositive/alter5.q.out 1eb24c2 
>   ql/src/test/results/clientpositive/alter_index.q.out bca4e12 
>   

[jira] [Created] (HIVE-18080) Performance degradation on VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled

2017-11-15 Thread liyunzhang (JIRA)
liyunzhang created HIVE-18080:
-

 Summary: Performance degradation on 
VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
 Key: HIVE-18080
 URL: https://issues.apache.org/jira/browse/HIVE-18080
 Project: Hive
  Issue Type: Bug
Reporter: liyunzhang


Use  Xeon(R) Platinum 8180 CPU to test the performance of 
[AVX512|https://en.wikipedia.org/wiki/AVX-512].
{code}
#cat /proc/cpuinfo |grep "model name"|head -n 1
model name  : Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
{code}
Before that I have compiled hive with JDK9 as JDK9 enables AVX512 
Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. It 
seems performance(20%+) in cases in 
{{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
 execpt 
{{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
 and
{{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is 
like following
When i use Skylake CPU to evaluate the performance improvement of AVX512.
I found the performance in VectorizedLogicBench is like following
|| ||AVX2 us/op||AVX512 us/op ||  (AVX2-AVX512)/AVX2||
|ColAndColBench|122510| 87014| 28.9%|
|IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% |
|IfExprLongColumnRepeatingLongColumnBench|1397447|1480450|  -5.9%|
|IfExprRepeatingLongColumnLongColumnBench|1401164|1483062|  -5.9% |
|NotColBench|77042.83|51513.28|  33%|

There are degradation in 
IfExprLongColumnLongColumnBench,IfExprLongColumnRepeatingLongColumnBench, 
IfExprRepeatingLongColumnLongColumnBench, very confused why there is 
degradation on IfExprLongColumnLongColumnBench cases.

Here we use {{taskset -cp 1 $pid}} to run the benchmark on single core to avoid 
the impact of dynamic CPU frequency scaling.
my script
{code}
export JAVA_HOME=/home/zly/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/jdk-9.0.1/mylib
for i in 0 1 2; do
java -server -XX:UseAVX=3 -jar benchmarks.jar 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx3.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
done

for i in 0 1 2; do
java -server -XX:UseAVX=2 -jar benchmarks.jar 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
done

{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2017-11-15 Thread Gopal V (JIRA)
Gopal V created HIVE-18079:
--

 Summary: Statistics: Allow HyperLogLog to be merged to the 
lowest-common-denominator bit-size
 Key: HIVE-18079
 URL: https://issues.apache.org/jira/browse/HIVE-18079
 Project: Hive
  Issue Type: Improvement
Reporter: Gopal V


HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
mathematical hash distribution & construction.

Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second scan 
over the data-set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 63806: HIVE-16756 : Vectorization: LongColModuloLongColumn throws java.lang.ArithmeticException: / by zero

2017-11-15 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63806/
---

(Updated Nov. 16, 2017, 5:21 a.m.)


Review request for hive, Aihua Xu and Matt McCline.


Changes
---

added a test case


Bugs: HIVE-16756
https://issues.apache.org/jira/browse/HIVE-16756


Repository: hive-git


Description
---

HIVE-16756 : Vectorization: LongColModuloLongColumn throws 
java.lang.ArithmeticException: / by zero


Diffs (updated)
-

  ql/src/gen/vectorization/ExpressionTemplates/ColumnDivideColumn.txt 
8b586b1f00ce7d6081f973a5736100d8941f79bc 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/LongColModuloLongColumn.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPMod.java 
6d3e82e9b96e012d875d947fa397c6c67df6a931 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java 
21f6540512ec795171014d87a6fde0d0ea5f23cf 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorArithmeticExpressions.java
 ea06ea0aefcf5e36f204eaad78131860b44298ae 
  ql/src/test/queries/clientpositive/vectorization_div0.q 
025d457807dd0642965a81c6b093e421c4acd0f8 
  ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 
631b0723fb0d7ab011ad2bfd7be4b33d11d76b1c 
  ql/src/test/results/clientpositive/tez/vectorization_div0.q.out 
6c3354cb4a8cd439d86df7e6b0cf759ea4c04cd0 
  ql/src/test/results/clientpositive/vectorization_div0.q.out 
97f1687b85193e681f26c61107a6d9266c1d87a2 
  vector-code-gen/src/org/apache/hadoop/hive/tools/GenVectorCode.java 
e58d4e91938dc266111042fe98b05a3d9c6fc5e9 


Diff: https://reviews.apache.org/r/63806/diff/3/

Changes: https://reviews.apache.org/r/63806/diff/2-3/


Testing
---


Thanks,

Vihang Karajgaonkar



RE: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

2017-11-15 Thread Zhang, Liyun
Hi Gopal:
Really thanks for your reply!
You mean that if I limit only 1 cpu to run 
VectorizedLogicBench.IfExprLongColumnLongColumnBench, the variation will be 
small, is my understanding right? If yes, the variation became smaller than 
before after using taskset -cp 1 $pid. But I am confused all the tests in 
VectorizedLogicBench is better pipelined and vectorized, why there is no large 
variation for other tests in VectorizedLogicBench? My guess is that the complex 
expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench 
actually uses more CPU than other expression.

The expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprLongColumnLongColumn.java#L90





Best Regards
Kelly Zhang/Zhang,Liyun




 
-Original Message-
From: Gopal Vijayaraghavan [mailto:gop...@apache.org] 
Sent: Thursday, November 16, 2017 5:40 AM
To: dev@hive.apache.org
Cc: Zhang, Liyun ; Teddy Choi 
Subject: Re: Anyone knows the problem I found in 
VectorizedLogicBench.IfExprLongColumnLongColumnBench?

Hi,

>   You see that there is a great float for 
> IfExprLongColumnLongColumnBench.bench, the  float is 583775 and the average 
> value is 1621602. 

In my tests, the single core tests tended to have huge variations on Intel with 
Turbo boost.

CPU operations which are fast when stressing CPU in single threaded mode tended 
to get really slow when the other cores spin up and hitting thermal limits.

For most memory bound operations this is not easily visible, but the better 
pipelined and vectorized the loops get the worse the impact of dynamic CPU 
frequency scaling.

Can you collect active CPU frequency when running this benchmark and do 
"taskset -c 1" to force the run to stick to a single CPU?

Cheers,
Gopal





Review Request 63864: HIVE-18072 WM - fix various bugs based on cluster testing - part 2

2017-11-15 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63864/
---

Review request for hive and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs
-

  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java
 a02a414a76 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 9dc521e39e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java 
a1775cd6bb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/UserPoolMapping.java 
851245c154 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java 
1fe5859490 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java 
5ba6639e0c 


Diff: https://reviews.apache.org/r/63864/diff/1/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-18078) WM getSession needs some retry logic

2017-11-15 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18078:
---

 Summary: WM getSession needs some retry logic
 Key: HIVE-18078
 URL: https://issues.apache.org/jira/browse/HIVE-18078
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin


When we get a bad session (e.g. no registry info because AM has gone 
catatonic), the failure by the timeout future fails the getSession call.
The retry model in TezTask is that it would get a session (which in original 
model can be completely unusable, but we still get the object), and then retry 
(reopen) if it's a lemon. If the reopen fails, we fail.
getSession is not covered by this retry scheme, and should thus do its own 
retries (or the retry logic needs to be changed)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18077) Vectorization: Add string conversion case for UDFToDouble

2017-11-15 Thread Matt McCline (JIRA)
Matt McCline created HIVE-18077:
---

 Summary: Vectorization: Add string conversion case for UDFToDouble
 Key: HIVE-18077
 URL: https://issues.apache.org/jira/browse/HIVE-18077
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 3.0.0


Add string to float/double vectorization.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Review Request 63855: HIVE-17717

2017-11-15 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63855/
---

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-17717
https://issues.apache.org/jira/browse/HIVE-17717


Repository: hive-git


Description
---

HIVE-17717


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
23b93cd29c1c145968aee847044a1f6100fdde07 
  ql/src/test/queries/clientpositive/druid_basic3.q PRE-CREATION 
  ql/src/test/results/clientpositive/druid_basic3.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/63855/diff/1/


Testing
---


Thanks,

Jesús Camacho Rodríguez



Review Request 63854: CachedStore: Have a whitelist/blacklist config to allow selective caching of tables/partitions and allow read while prewarming

2017-11-15 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63854/
---

Review request for hive, Daniel Dai, Sergey Shelukhin, and Thejas Nair.


Bugs: HIVE-18056
https://issues.apache.org/jira/browse/HIVE-18056


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-18056


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java bd25bc7cad 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
 c61f27b326 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
 a76b8480b0 


Diff: https://reviews.apache.org/r/63854/diff/1/


Testing
---


Thanks,

Vaibhav Gumashta



[ANNOUNCE] Apache Hive 2.3.2 Released

2017-11-15 Thread Sahil Takiar
The Apache Hive team is proud to announce the release of Apache Hive
version 2.3.2.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark
frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.3.2 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342053=Text=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team


Re: Review Request 63806: HIVE-16756 : Vectorization: LongColModuloLongColumn throws java.lang.ArithmeticException: / by zero

2017-11-15 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63806/
---

(Updated Nov. 15, 2017, 10:50 p.m.)


Review request for hive, Aihua Xu and Matt McCline.


Changes
---

the fix was needed only in case of long % long expression. Removed changes to 
LongColDivideLongColumn.java and removed the unnecessary file 
ColumnDivideLong.txt


Summary (updated)
-

HIVE-16756 : Vectorization: LongColModuloLongColumn throws 
java.lang.ArithmeticException: / by zero


Bugs: HIVE-16756
https://issues.apache.org/jira/browse/HIVE-16756


Repository: hive-git


Description (updated)
---

HIVE-16756 : Vectorization: LongColModuloLongColumn throws 
java.lang.ArithmeticException: / by zero


Diffs (updated)
-

  ql/src/gen/vectorization/ExpressionTemplates/ColumnDivideColumn.txt 
8b586b1f00ce7d6081f973a5736100d8941f79bc 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/LongColModuloLongColumn.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPMod.java 
6d3e82e9b96e012d875d947fa397c6c67df6a931 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java 
21f6540512ec795171014d87a6fde0d0ea5f23cf 
  ql/src/test/queries/clientpositive/vectorization_div0.q 
025d457807dd0642965a81c6b093e421c4acd0f8 
  ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 
631b0723fb0d7ab011ad2bfd7be4b33d11d76b1c 
  ql/src/test/results/clientpositive/tez/vectorization_div0.q.out 
6c3354cb4a8cd439d86df7e6b0cf759ea4c04cd0 
  ql/src/test/results/clientpositive/vectorization_div0.q.out 
97f1687b85193e681f26c61107a6d9266c1d87a2 
  vector-code-gen/src/org/apache/hadoop/hive/tools/GenVectorCode.java 
e58d4e91938dc266111042fe98b05a3d9c6fc5e9 


Diff: https://reviews.apache.org/r/63806/diff/2/

Changes: https://reviews.apache.org/r/63806/diff/1-2/


Testing
---


Thanks,

Vihang Karajgaonkar



[jira] [Created] (HIVE-18076) killquery doesn't actually work for non-trigger WM kills, or the error message is not propagated

2017-11-15 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18076:
---

 Summary: killquery doesn't actually work for non-trigger WM kills, 
or the error message is not propagated
 Key: HIVE-18076
 URL: https://issues.apache.org/jira/browse/HIVE-18076
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Not sure what's wrong with it, need to take a look.
It dumps a lot of info about everything being cancelled, instead of a nice 
message like triggers do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18075) verify commands on a cluster

2017-11-15 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18075:
---

 Summary: verify commands on a cluster
 Key: HIVE-18075
 URL: https://issues.apache.org/jira/browse/HIVE-18075
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin


I was running the commands in the cluster, with potentially a slightly outdated 
version of the DB; however, recent master code + the pools commands patch. I've 
hit the following issues.
# Cannot drop pool or RP with a mapping (see also 3).
# Cannot drop pool that is set as default (probably correct, but the error 
message is bad).
# When I dropped an RP with a mapping, and then created it again with the same 
name, the pool creation in that RP would fail with an error that a unique query 
returned multiple results. In the DB, there were actually 2 RPs with the same 
name. Not sure how exactly that happened, there might have been intermediate 
states, but I didn't mess with mysql. I think the name uniqueness is either 
missing from some script or doesn't work.
# Setting RP default pool no longer works. I think I might have broken it with 
one of the rebases in that area, but it could also be something else (or like 
other things, it works in q tests but not on cluster for whatever reason).
# Resource plan rename doesn't check the disable state. It probably should. 
Also need to see for other commands.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18074) do not show rejected tasks as killed in query UI

2017-11-15 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18074:
---

 Summary: do not show rejected tasks as killed in query UI
 Key: HIVE-18074
 URL: https://issues.apache.org/jira/browse/HIVE-18074
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


Tasks rejected from LLAP because the cluster is full are shown as killed tasks 
in the commandline query UI (CLI and beeline). This shouldn't really happen; 
killed tasks in the container case means something else, and this scenario 
doesn't exist because AM doesn't continuously try to queue tasks. We could 
change LLAP queue to use sort of a pull model (would also allow for better 
duplicate scheduling), but for now we should fix the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18073) AM may assert when duck count for it is reduced

2017-11-15 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18073:
---

 Summary: AM may assert when duck count for it is reduced
 Key: HIVE-18073
 URL: https://issues.apache.org/jira/browse/HIVE-18073
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin


Sometimes it asserts that it doesn't have so many ducks to give away. This 
should never happen, need to debug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18072) WM - fix various bugs based on cluster testing - part 2

2017-11-15 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18072:
---

 Summary: WM - fix various bugs based on cluster testing - part 2
 Key: HIVE-18072
 URL: https://issues.apache.org/jira/browse/HIVE-18072
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18071) add HS2 jmx information about pools and current resource plan

2017-11-15 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18071:
---

 Summary: add HS2 jmx information about pools and current resource 
plan
 Key: HIVE-18071
 URL: https://issues.apache.org/jira/browse/HIVE-18071
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18070) Merge partitions NDV estimators in batches

2017-11-15 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-18070:
--

 Summary: Merge partitions NDV estimators in batches
 Key: HIVE-18070
 URL: https://issues.apache.org/jira/browse/HIVE-18070
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

2017-11-15 Thread Gopal Vijayaraghavan
Hi,

>   You see that there is a great float for 
> IfExprLongColumnLongColumnBench.bench, the  float is 583775 and the average 
> value is 1621602. 

In my tests, the single core tests tended to have huge variations on Intel with 
Turbo boost.

CPU operations which are fast when stressing CPU in single threaded mode tended 
to get really slow when the other cores spin up and hitting thermal limits.

For most memory bound operations this is not easily visible, but the better 
pipelined and vectorized the loops get the worse the impact of dynamic CPU 
frequency scaling.

Can you collect active CPU frequency when running this benchmark and do 
"taskset -c 1" to force the run to stick to a single CPU?

Cheers,
Gopal





RE: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

2017-11-15 Thread Zhang, Liyun
Hi all:
Now I am using hive micro bench(HIVE-10189) to test the performance improvement 
of AVX2 and AVX512.
When I test the 
VectorizedLogicBench.IfExprLongColumnLongColumnBench,
 I found the result as following
When enabling AVX512:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench
 avgt   10  1621602.652 ± 583775.700  us/op
When enabling AVX2:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench
 avgt   10  1817855.876 ± 49289.868  us/op

You see that there is a great float for IfExprLongColumnLongColumnBench.bench, 
the  float is 583775 and the average value is 1621602. It shows that the values 
in the test are very discrete @Teddy, as you are more familiar with the code, 
do you know why the test data is discrete? If the data is discrete, does this 
mean the test data  is not stable?



Appreciate to get some feedback from you!
Best Regards
Kelly Zhang/Zhang,Liyun



Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench?

2017-11-15 Thread Zhang, Liyun

Hi all:
Now I am using hive micro bench(HIVE-10189) to test the performance improvement 
of AVX2 and AVX512.
When I test the 
VectorizedLogicBench.IfExprLongColumnLongColumnBench,
 I found the result as following
When enabling AVX512:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench
 avgt   10  1621602.652 ± 583775.700  us/op
When enabling AVX2:
o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench
 avgt   10  1817855.876 ± 49289.868  us/op

You see that there is a great float for IfExprLongColumnLongColumnBench.bench, 
the  float is 583775 and the average value is 1621602. It shows that the values 
in the test are very discrete @Teddy, as you are more familiar with the code, 
do you know why the test data is discrete? If the data is discrete, does this 
mean the test data  is not stable?



Appreciate to get some feedback from you!
Best Regards
Kelly Zhang/Zhang,Liyun



[jira] [Created] (HIVE-18069) MetaStoreDirectSql to get tables has misplaced comma

2017-11-15 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-18069:
--

 Summary: MetaStoreDirectSql to get tables has misplaced comma
 Key: HIVE-18069
 URL: https://issues.apache.org/jira/browse/HIVE-18069
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Aihua Xu
Assignee: Jesus Camacho Rodriguez


Introduced by HIVE-15436.

Cc [~aihuaxu]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 63711: HIVE-17528 Add more q-tests for Hive-on-Spark with Parquet vectorized reader

2017-11-15 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63711/#review191092
---



Thanks for the patch Ferdinand. Just one comment below, rest of the qfiles look 
good to me. We should also do a separate exercise to verify if the output of 
the queries in these qfiles match when vectorization is disabled to make sure 
that golden files are not hiding any bugs.


ql/src/test/results/clientpositive/llap/sysdb.q.out
Lines 2236 (patched)


not sure why this file is changing. Do you know?



ql/src/test/results/clientpositive/llap/sysdb.q.out
Line 3646 (original), 3706 (patched)


how come data values are changing here?


- Vihang Karajgaonkar


On Nov. 15, 2017, 1:34 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63711/
> ---
> 
> (Updated Nov. 15, 2017, 1:34 a.m.)
> 
> 
> Review request for hive and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Most of the vectorization related q-tests operate on ORC tables using Tez. It 
> would be good to add more coverage on a different combination of engine and 
> file-format. We can model existing q-tests using parquet tables and run it 
> using TestSparkCliDriver
> 
> 
> Diffs
> -
> 
>   data/scripts/q_test_cleanup.sql 4620dcd 
>   data/scripts/q_test_init.sql f763c12 
>   itests/src/test/resources/testconfiguration.properties 1d16b65 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java f1d90ff 
>   pom.xml dfb29ce 
>   ql/src/test/queries/clientpositive/parquet_read_backward_compatible_files.q 
> 0abbc2f 
>   ql/src/test/queries/clientpositive/parquet_vectorization_0.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_10.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_11.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_12.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_13.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_14.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_15.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_16.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_17.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_5.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_6.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_7.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_8.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_9.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_decimal_date.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_div0.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_limit.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_nested_udf.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_not.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_offset_limit.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_part.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_part_project.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_part_varchar.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_vectorization_pushdown.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/add_part_exist.q.out f8d50ca 
>   ql/src/test/results/clientpositive/alter1.q.out c2efbe5 
>   ql/src/test/results/clientpositive/alter2.q.out 18032ac 
>   ql/src/test/results/clientpositive/alter3.q.out 3bd7288 
>   ql/src/test/results/clientpositive/alter4.q.out ddcb0ed 
>   ql/src/test/results/clientpositive/alter5.q.out 1eb24c2 
>   ql/src/test/results/clientpositive/alter_index.q.out bca4e12 
>   ql/src/test/results/clientpositive/alter_rename_partition.q.out 5702d39 
>   ql/src/test/results/clientpositive/authorization_9.q.out 39e0a56 
>   ql/src/test/results/clientpositive/authorization_show_grant.q.out d0fed81 
>   

[jira] [Created] (HIVE-18068) Replace LocalInterval by Interval in Druid storage handler

2017-11-15 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-18068:
--

 Summary: Replace LocalInterval by Interval in Druid storage handler
 Key: HIVE-18068
 URL: https://issues.apache.org/jira/browse/HIVE-18068
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 63845: HIVE-15018

2017-11-15 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63845/
---

(Updated Nov. 15, 2017, 5:52 p.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-15018
https://issues.apache.org/jira/browse/HIVE-15018


Repository: hive-git


Description
---

HIVE-15018


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
54491f41e2588db7e7a64fe178ce47835cc19941 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
7961b2b912f03b6c72ca60b06d7eda5026b691ae 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
f5c162d80a6c521715e2d296c38924539e485594 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
64d16750afabbe1a8ba39ab6897c3515f3e82753 
  ql/src/java/org/apache/hadoop/hive/ql/plan/AlterMaterializedViewDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 
369f8440a25c2329f307d1fa5875006b3bbb8f41 
  ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 
61c540789791cd09d4f074193b09e94c9cb4ba65 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveOperationType.java
 4be42c1935ecbead08c49e0bf234f4f196b6bb7a 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java
 080880be6e9e5fde81e5c3f27aac8431a8818838 
  ql/src/test/queries/clientpositive/materialized_view_create_rewrite.q 
b17517f76be6809bdb9733248a87825745319775 
  
ql/src/test/results/clientpositive/beeline/materialized_view_create_rewrite.q.out
 f6b161b690880a554c13aa02682b461530fa7686 
  ql/src/test/results/clientpositive/materialized_view_create_rewrite.q.out 
f6b161b690880a554c13aa02682b461530fa7686 


Diff: https://reviews.apache.org/r/63845/diff/2/

Changes: https://reviews.apache.org/r/63845/diff/1-2/


Testing
---


Thanks,

Jesús Camacho Rodríguez



Review Request 63845: HIVE-15018

2017-11-15 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63845/
---

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-15018
https://issues.apache.org/jira/browse/HIVE-15018


Repository: hive-git


Description
---

HIVE-15018


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
54491f41e2588db7e7a64fe178ce47835cc19941 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
7961b2b912f03b6c72ca60b06d7eda5026b691ae 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
f5c162d80a6c521715e2d296c38924539e485594 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
64d16750afabbe1a8ba39ab6897c3515f3e82753 
  ql/src/java/org/apache/hadoop/hive/ql/plan/AlterMaterializedViewDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 
369f8440a25c2329f307d1fa5875006b3bbb8f41 
  ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 
61c540789791cd09d4f074193b09e94c9cb4ba65 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveOperationType.java
 4be42c1935ecbead08c49e0bf234f4f196b6bb7a 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java
 080880be6e9e5fde81e5c3f27aac8431a8818838 
  ql/src/test/queries/clientpositive/materialized_view_create_rewrite.q 
b17517f76be6809bdb9733248a87825745319775 
  
ql/src/test/results/clientpositive/beeline/materialized_view_create_rewrite.q.out
 f6b161b690880a554c13aa02682b461530fa7686 
  ql/src/test/results/clientpositive/materialized_view_create_rewrite.q.out 
f6b161b690880a554c13aa02682b461530fa7686 


Diff: https://reviews.apache.org/r/63845/diff/1/


Testing
---


Thanks,

Jesús Camacho Rodríguez



Like udf function optimizaton

2017-11-15 Thread 万昆
I want to optimize the performance of the like function in a particular 
scenario. 
Could someone help me to review the code?


The lira : https://issues.apache.org/jira/browse/HIVE-18055
Thanks