[jira] [Commented] (HIVE-16180) LLAP: Native memory leak in EncodedReader

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934126#comment-15934126
 ] 

Hive QA commented on HIVE-16180:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859687/HIVE-16180.04.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10480 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4260/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4260/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4260/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859687 - PreCommit-HIVE-Build

> LLAP: Native memory leak in EncodedReader
> -
>
> Key: HIVE-16180
> URL: https://issues.apache.org/jira/browse/HIVE-16180
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: DirectCleaner.java, FullGC-15GB-cleanup.png, 
> Full-gc-native-mem-cleanup.png, HIVE-16180.03.patch, HIVE-16180.04.patch, 
> HIVE-16180.1.patch, HIVE-16180.2.patch, Native-mem-spike.png
>
>
> Observed this in internal test run. There is a native memory leak in Orc 
> EncodedReaderImpl that can cause YARN pmem monitor to kill the container 
> running the daemon. Direct byte buffers are null'ed out which is not 
> guaranteed to be cleaned until next Full GC. To show this issue, attaching a 
> small test program that allocates 3x256MB direct byte buffers. First buffer 
> is null'ed out but still native memory is used. Second buffer user Cleaner to 
> clean up native allocation. Third buffer is also null'ed but this time 
> invoking a System.gc() which cleans up all native memory. Output from the 
> test program is below
> {code}
> Allocating 3x256MB direct memory..
> Native memory used: 786432000
> Native memory used after data1=null: 786432000
> Native memory used after data2.clean(): 524288000
> Native memory used after data3=null: 524288000
> Native memory used without gc: 524288000
> Native memory used after gc: 0
> {code}
> Longer term improvements/solutions:
> 1) Use DirectBufferPool from hadoop or netty's 
> https://netty.io/4.0/api/io/netty/buffer/PooledByteBufAllocator.html as 
> direct byte buffer allocations are expensive (System.gc() + 100ms thread 
> sleep).
> 2) Use HADOOP-12760 for proper cleaner invocation in JDK8 and JDK9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16239) remove useless hiveserver

2017-03-20 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16239:
---
Attachment: (was: HIVE-16239.1-branch-2.1.patch)

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.0.1, 2.1.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16239) remove useless hiveserver

2017-03-20 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16239:
---
Attachment: (was: HIVE-16239.1-branch-2.0.patch)

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.0.1, 2.1.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16239) remove useless hiveserver

2017-03-20 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16239:
---
Status: Patch Available  (was: Open)

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.1, 2.0.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.0.patch, HIVE-16239.1-branch-2.1.patch, 
> HIVE-16239.1-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16239) remove useless hiveserver

2017-03-20 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16239:
---
Status: Open  (was: Patch Available)

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.1, 2.0.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.0.patch, HIVE-16239.1-branch-2.1.patch, 
> HIVE-16239.1-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16239) remove useless hiveserver

2017-03-20 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16239:
---
Attachment: HIVE-16239.1-branch-2.1.patch
HIVE-16239.1-branch-2.0.patch

reattaching again to build

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.0.1, 2.1.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.0.patch, HIVE-16239.1-branch-2.1.patch, 
> HIVE-16239.1-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15691:

Target Version/s: 1.3.0, 1.2.2, 2.2.0  (was: 1.2.2)

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
>Priority: Critical
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691-branch-1.2.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15691:

Fix Version/s: (was: 2.2.0)
   (was: 1.2.2)
   (was: 1.3.0)

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
>Priority: Critical
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691-branch-1.2.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16246) Support auto gather column stats for columns with trailing white spaces

2017-03-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934105#comment-15934105
 ] 

Ashutosh Chauhan commented on HIVE-16246:
-

+1

> Support auto gather column stats for columns with trailing white spaces
> ---
>
> Key: HIVE-16246
> URL: https://issues.apache.org/jira/browse/HIVE-16246
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16246.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16219) metastore notification_log contains serialized message with non functional fields

2017-03-20 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934104#comment-15934104
 ] 

anishek commented on HIVE-16219:


[~thejas]/[~vgumashta] please review 

> metastore notification_log contains serialized message with  non functional 
> fields
> --
>
> Key: HIVE-16219
> URL: https://issues.apache.org/jira/browse/HIVE-16219
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 2.2.0
>
> Attachments: HIVE-16219.1.patch, HIVE-16219.1.patch
>
>
> the event notification logs stored in hive metastore have json serialized 
> messages stored in NOTIFICATION_LOG table,  these messages also store the 
> serialized Thrift API objects in them for ex for create table :
> {code}
> {
>   "eventType": "CREATE_TABLE",
>   "server": "",
>   "servicePrincipal": "",
>   "db": "default",
>   "table": "a",
>   "tableObjJson": 
> "{\"1\":{\"str\":\"a\"},\"2\":{\"str\":\"default\"},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1489552350},\"5\":{\"i32\":0},\"6\":{\"i32\":0},\"7\":{\"rec\":{\"1\":{\"lst\":[\"rec\",1,{\"1\":{\"str\":\"name\"},\"2\":{\"str\":\"string\"}}]},\"2\":{\"str\":\"file:/tmp/warehouse/a\"},\"3\":{\"str\":\"org.apache.hadoop.mapred.TextInputFormat\"},\"4\":{\"str\":\"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\"},\"5\":{\"tf\":0},\"6\":{\"i32\":-1},\"7\":{\"rec\":{\"2\":{\"str\":\"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe\"},\"3\":{\"map\":[\"str\",\"str\",2,{\"field.delim\":\"\\n\",\"serialization.format\":\"\\n\"}]}}},\"8\":{\"lst\":[\"str\",0]},\"9\":{\"lst\":[\"rec\",0]},\"10\":{\"map\":[\"str\",\"str\",0,{}]},\"11\":{\"rec\":{\"1\":{\"lst\":[\"str\",0]},\"2\":{\"lst\":[\"lst\",0]},\"3\":{\"map\":[\"lst\",\"str\",0,{}]}}},\"12\":{\"tf\":0}}},\"8\":{\"lst\":[\"rec\",0]},\"9\":{\"map\":[\"str\",\"str\",7,{\"totalSize\":\"0\",\"EXTERNAL\":\"TRUE\",\"numRows\":\"0\",\"rawDataSize\":\"0\",\"COLUMN_STATS_ACCURATE\":\"{\\\"BASIC_STATS\\\":\\\"true\\\"}\",\"numFiles\":\"0\",\"transient_lastDdlTime\":\"1489552350\"}]},\"12\":{\"str\":\"EXTERNAL_TABLE\"},\"13\":{\"rec\":{\"1\":{\"map\":[\"str\",\"lst\",1,{\"anagarwal\":[\"rec\",4,{\"1\":{\"str\":\"INSERT\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"SELECT\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"UPDATE\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"DELETE\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}}]}]}}},\"14\":{\"tf\":0}}",
>   "timestamp": 1489552350,
>   "files": [],
>   "tableObj": {
> "tableName": "a",
> "dbName": "default",
> "owner": "anagarwal",
> "createTime": 1489552350,
> "lastAccessTime": 0,
> "retention": 0,
> "sd": {
>   "cols": [
> {
>   "name": "name",
>   "type": "string",
>   "comment": null,
>   "setName": true,
>   "setType": true,
>   "setComment": false
> }
>   ],
>   "location": "file:/tmp/warehouse/a",
>   "inputFormat": "org.apache.hadoop.mapred.TextInputFormat",
>   "outputFormat": 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
>   "compressed": false,
>   "numBuckets": -1,
>   "serdeInfo": {
> "name": null,
> "serializationLib": 
> "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
> "parameters": {
>   "serialization.format": "\n",
>   "field.delim": "\n"
> },
> "setName": false,
> "parametersSize": 2,
> "setParameters": true,
> "setSerializationLib": true
>   },
>   "bucketCols": [],
>   "sortCols": [],
>   "parameters": {},
>   "skewedInfo": {
> "skewedColNames": [],
> "skewedColValues": [],
> "skewedColValueLocationMaps": {},
> "setSkewedColNames": true,
> "setSkewedColValues": true,
> "setSkewedColValueLocationMaps": true,
> "skewedColNamesSize": 0,
> "skewedColNamesIterator": [],
> "skewedColValuesSize": 0,
> "skewedColValuesIterator": [],
> "skewedColValueLocationMapsSize": 0
>   },
>   "storedAsSubDirectories": false,
>   "setSkewedInfo": true,
>   "parametersSize": 0,
>   "colsSize": 1,
>   "setParameters": true,
>   "setLocation": true,
>   "setInputFormat": true,
>   "setCols": true,
>   "setOutputFormat": true,
>   "setSerdeInfo": true,
>   "setBucketCols": true,
>   "setSortCols": true,
>   "colsIterator": [
> {
>   

[jira] [Updated] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15691:

Priority: Critical  (was: Major)
Target Version/s: 1.2.2

Updating target version and making it a critical for 1.2.2

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
>Priority: Critical
> Fix For: 1.3.0, 1.2.2, 2.2.0
>
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691-branch-1.2.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16254) metadata for values temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16254:
---
Attachment: HIVE-16254.1.patch

> metadata for values temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16254.1.patch
>
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16254) metadata for values temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16254:
---
Attachment: (was: HIVE-16254.1.patch)

> metadata for values temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16254.1.patch
>
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16254) metadata for values temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16254:
---
Attachment: (was: HIVE-16254.1.patch)

> metadata for values temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16254.1.patch
>
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934100#comment-15934100
 ] 

Vaibhav Gumashta commented on HIVE-15691:
-

[~ekoifman] I haven't created a release tag yet, so to get it into 1.2.2, it 
should be good to commit it to branch-1.2. I looked at the test failures (from 
the tests that ran and not timed out):
{code}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload (batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_from_utc_timestamp 
(batchId=120)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_to_utc_timestamp 
(batchId=33)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1
 (batchId=176)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_empty 
(batchId=165)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_auto_smb_mapjoin_14
 (batchId=164)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_exchgpartition2lel 
(batchId=136)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
 (batchId=144)
org.apache.hive.minikdc.TestHiveAuthFactory.testStartTokenManagerForDBTokenStore
 (batchId=427)
org.apache.hive.minikdc.TestHiveAuthFactory.testStartTokenManagerForMemoryTokenStore
 (batchId=427)
org.apache.hive.minikdc.TestMiniHiveKdc.testLogin (batchId=424)
{code}
All of these are flaky tests / need updated diff files which I have analyzed in 
HIVE-15007, so the tests for this patch look good. However, if the patch is 
revised, it will be good to get another run.

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Fix For: 1.3.0, 1.2.2, 2.2.0
>
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691-branch-1.2.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934094#comment-15934094
 ] 

Hive QA commented on HIVE-15691:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859678/HIVE-15691-branch-1.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 133 failed/errored test(s), 7900 tests 
executed
*Failed tests:*
{noformat}
TestAdminUser - did not produce a TEST-*.xml file (likely timed out) 
(batchId=340)
TestAuthorizationPreEventListener - did not produce a TEST-*.xml file (likely 
timed out) (batchId=371)
TestAuthzApiEmbedAuthorizerInEmbed - did not produce a TEST-*.xml file (likely 
timed out) (batchId=350)
TestAuthzApiEmbedAuthorizerInRemote - did not produce a TEST-*.xml file (likely 
timed out) (batchId=356)
TestBeeLineWithArgs - did not produce a TEST-*.xml file (likely timed out) 
(batchId=378)
TestCLIAuthzSessionContext - did not produce a TEST-*.xml file (likely timed 
out) (batchId=394)
TestClientSideAuthorizationProvider - did not produce a TEST-*.xml file (likely 
timed out) (batchId=370)
TestCompactor - did not produce a TEST-*.xml file (likely timed out) 
(batchId=360)
TestCreateUdfEntities - did not produce a TEST-*.xml file (likely timed out) 
(batchId=359)
TestCustomAuthentication - did not produce a TEST-*.xml file (likely timed out) 
(batchId=379)
TestDBTokenStore - did not produce a TEST-*.xml file (likely timed out) 
(batchId=325)
TestDDLWithRemoteMetastoreSecondNamenode - did not produce a TEST-*.xml file 
(likely timed out) (batchId=358)
TestDynamicSerDe - did not produce a TEST-*.xml file (likely timed out) 
(batchId=328)
TestEmbeddedHiveMetaStore - did not produce a TEST-*.xml file (likely timed 
out) (batchId=337)
TestEmbeddedThriftBinaryCLIService - did not produce a TEST-*.xml file (likely 
timed out) (batchId=382)
TestFilterHooks - did not produce a TEST-*.xml file (likely timed out) 
(batchId=332)
TestFolderPermissions - did not produce a TEST-*.xml file (likely timed out) 
(batchId=365)
TestHS2AuthzContext - did not produce a TEST-*.xml file (likely timed out) 
(batchId=397)
TestHS2AuthzSessionContext - did not produce a TEST-*.xml file (likely timed 
out) (batchId=398)
TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file (likely 
timed out) (batchId=386)
TestHiveAuthorizerCheckInvocation - did not produce a TEST-*.xml file (likely 
timed out) (batchId=374)
TestHiveAuthorizerShowFilters - did not produce a TEST-*.xml file (likely timed 
out) (batchId=373)
TestHiveHistory - did not produce a TEST-*.xml file (likely timed out) 
(batchId=376)
TestHiveMetaStoreTxns - did not produce a TEST-*.xml file (likely timed out) 
(batchId=352)
TestHiveMetaStoreWithEnvironmentContext - did not produce a TEST-*.xml file 
(likely timed out) (batchId=342)
TestHiveMetaTool - did not produce a TEST-*.xml file (likely timed out) 
(batchId=355)
TestHiveServer2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=400)
TestHiveServer2SessionTimeout - did not produce a TEST-*.xml file (likely timed 
out) (batchId=401)
TestHiveSessionImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=383)
TestHs2Hooks - did not produce a TEST-*.xml file (likely timed out) 
(batchId=357)
TestHs2HooksWithMiniKdc - did not produce a TEST-*.xml file (likely timed out) 
(batchId=429)
TestJdbcDriver2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=388)
TestJdbcMetadataApiAuth - did not produce a TEST-*.xml file (likely timed out) 
(batchId=399)
TestJdbcWithLocalClusterSpark - did not produce a TEST-*.xml file (likely timed 
out) (batchId=393)
TestJdbcWithMiniHS2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=390)
TestJdbcWithMiniKdc - did not produce a TEST-*.xml file (likely timed out) 
(batchId=426)
TestJdbcWithMiniKdcCookie - did not produce a TEST-*.xml file (likely timed 
out) (batchId=425)
TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file (likely 
timed out) (batchId=423)
TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file (likely 
timed out) (batchId=428)
TestJdbcWithMiniMr - did not produce a TEST-*.xml file (likely timed out) 
(batchId=389)
TestJdbcWithSQLAuthUDFBlacklist - did not produce a TEST-*.xml file (likely 
timed out) (batchId=395)
TestJdbcWithSQLAuthorization - did not produce a TEST-*.xml file (likely timed 
out) (batchId=396)
TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=363)
TestMTQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=361)
TestMarkPartition - did not produce a TEST-*.xml file (likely timed out) 
(batchId=349)
TestMarkPartitionRemote - did not produce a TEST-*.xml file (likely timed out) 
(batchId=353)
TestMetaStoreAuthorization - did not produce a TEST-*.xml file (likely timed 
out) 

[jira] [Commented] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934091#comment-15934091
 ] 

Ashutosh Chauhan commented on HIVE-16178:
-

+1

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16178.1.patch, HIVE-16178.2.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15776) Flaky test: TestMiniLlapLocalCliDriver vector_if_expr

2017-03-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934069#comment-15934069
 ] 

Ashutosh Chauhan commented on HIVE-15776:
-

[~mmccline] [~teddy.choi] Any ideas here? This tests fail from time to time.

> Flaky test: TestMiniLlapLocalCliDriver vector_if_expr
> -
>
> Key: HIVE-15776
> URL: https://issues.apache.org/jira/browse/HIVE-15776
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Thejas M Nair
>Priority: Critical
>
> Failed in https://builds.apache.org/job/PreCommit-HIVE-Build/3274/ with 
> following error in test log -
> java.lang.AssertionError: 
> Unexpected exception java.lang.AssertionError: Client execution failed with 
> error code = 2 running 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16251) Vectorization: new octet_length function (HIVE-15979) get NPE

2017-03-20 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16251:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Teddy!

> Vectorization: new octet_length function (HIVE-15979) get NPE
> -
>
> Key: HIVE-16251
> URL: https://issues.apache.org/jira/browse/HIVE-16251
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16251.1.patch, HIVE-16251.2.patch
>
>
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.OctetLength.evaluate(OctetLength.java:53)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:125)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:900)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.7.2.jar:?]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
> {code}
> I think it is the use of transient:
> {code}
>   private transient int colNum;
>   private transient int outputColumn;
> {code}
> Transient columns lose their value when serialized from plan to execution.  
> On execution both are 0.  Not sure why tests didn't fail.
> CC: [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16254) metadata for values temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16254:
---
Attachment: HIVE-16254.1.patch

build system did not pick this one up, attaching again.

> metadata for values temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16254.1.patch, HIVE-16254.1.patch
>
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16219) metastore notification_log contains serialized message with non functional fields

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16219:
---
Attachment: HIVE-16219.1.patch

no idea what is happening to build system keeps crapping out and not running 
builds, so another attach

> metastore notification_log contains serialized message with  non functional 
> fields
> --
>
> Key: HIVE-16219
> URL: https://issues.apache.org/jira/browse/HIVE-16219
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 2.2.0
>
> Attachments: HIVE-16219.1.patch, HIVE-16219.1.patch
>
>
> the event notification logs stored in hive metastore have json serialized 
> messages stored in NOTIFICATION_LOG table,  these messages also store the 
> serialized Thrift API objects in them for ex for create table :
> {code}
> {
>   "eventType": "CREATE_TABLE",
>   "server": "",
>   "servicePrincipal": "",
>   "db": "default",
>   "table": "a",
>   "tableObjJson": 
> "{\"1\":{\"str\":\"a\"},\"2\":{\"str\":\"default\"},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1489552350},\"5\":{\"i32\":0},\"6\":{\"i32\":0},\"7\":{\"rec\":{\"1\":{\"lst\":[\"rec\",1,{\"1\":{\"str\":\"name\"},\"2\":{\"str\":\"string\"}}]},\"2\":{\"str\":\"file:/tmp/warehouse/a\"},\"3\":{\"str\":\"org.apache.hadoop.mapred.TextInputFormat\"},\"4\":{\"str\":\"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\"},\"5\":{\"tf\":0},\"6\":{\"i32\":-1},\"7\":{\"rec\":{\"2\":{\"str\":\"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe\"},\"3\":{\"map\":[\"str\",\"str\",2,{\"field.delim\":\"\\n\",\"serialization.format\":\"\\n\"}]}}},\"8\":{\"lst\":[\"str\",0]},\"9\":{\"lst\":[\"rec\",0]},\"10\":{\"map\":[\"str\",\"str\",0,{}]},\"11\":{\"rec\":{\"1\":{\"lst\":[\"str\",0]},\"2\":{\"lst\":[\"lst\",0]},\"3\":{\"map\":[\"lst\",\"str\",0,{}]}}},\"12\":{\"tf\":0}}},\"8\":{\"lst\":[\"rec\",0]},\"9\":{\"map\":[\"str\",\"str\",7,{\"totalSize\":\"0\",\"EXTERNAL\":\"TRUE\",\"numRows\":\"0\",\"rawDataSize\":\"0\",\"COLUMN_STATS_ACCURATE\":\"{\\\"BASIC_STATS\\\":\\\"true\\\"}\",\"numFiles\":\"0\",\"transient_lastDdlTime\":\"1489552350\"}]},\"12\":{\"str\":\"EXTERNAL_TABLE\"},\"13\":{\"rec\":{\"1\":{\"map\":[\"str\",\"lst\",1,{\"anagarwal\":[\"rec\",4,{\"1\":{\"str\":\"INSERT\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"SELECT\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"UPDATE\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"DELETE\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}}]}]}}},\"14\":{\"tf\":0}}",
>   "timestamp": 1489552350,
>   "files": [],
>   "tableObj": {
> "tableName": "a",
> "dbName": "default",
> "owner": "anagarwal",
> "createTime": 1489552350,
> "lastAccessTime": 0,
> "retention": 0,
> "sd": {
>   "cols": [
> {
>   "name": "name",
>   "type": "string",
>   "comment": null,
>   "setName": true,
>   "setType": true,
>   "setComment": false
> }
>   ],
>   "location": "file:/tmp/warehouse/a",
>   "inputFormat": "org.apache.hadoop.mapred.TextInputFormat",
>   "outputFormat": 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
>   "compressed": false,
>   "numBuckets": -1,
>   "serdeInfo": {
> "name": null,
> "serializationLib": 
> "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
> "parameters": {
>   "serialization.format": "\n",
>   "field.delim": "\n"
> },
> "setName": false,
> "parametersSize": 2,
> "setParameters": true,
> "setSerializationLib": true
>   },
>   "bucketCols": [],
>   "sortCols": [],
>   "parameters": {},
>   "skewedInfo": {
> "skewedColNames": [],
> "skewedColValues": [],
> "skewedColValueLocationMaps": {},
> "setSkewedColNames": true,
> "setSkewedColValues": true,
> "setSkewedColValueLocationMaps": true,
> "skewedColNamesSize": 0,
> "skewedColNamesIterator": [],
> "skewedColValuesSize": 0,
> "skewedColValuesIterator": [],
> "skewedColValueLocationMapsSize": 0
>   },
>   "storedAsSubDirectories": false,
>   "setSkewedInfo": true,
>   "parametersSize": 0,
>   "colsSize": 1,
>   "setParameters": true,
>   "setLocation": true,
>   "setInputFormat": true,
>   "setCols": true,
>   "setOutputFormat": true,
>   "setSerdeInfo": true,
>   "setBucketCols": true,
>   

[jira] [Updated] (HIVE-16219) metastore notification_log contains serialized message with non functional fields

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16219:
---
Attachment: (was: HIVE-16219.1.patch)

> metastore notification_log contains serialized message with  non functional 
> fields
> --
>
> Key: HIVE-16219
> URL: https://issues.apache.org/jira/browse/HIVE-16219
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 2.2.0
>
> Attachments: HIVE-16219.1.patch, HIVE-16219.1.patch
>
>
> the event notification logs stored in hive metastore have json serialized 
> messages stored in NOTIFICATION_LOG table,  these messages also store the 
> serialized Thrift API objects in them for ex for create table :
> {code}
> {
>   "eventType": "CREATE_TABLE",
>   "server": "",
>   "servicePrincipal": "",
>   "db": "default",
>   "table": "a",
>   "tableObjJson": 
> "{\"1\":{\"str\":\"a\"},\"2\":{\"str\":\"default\"},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1489552350},\"5\":{\"i32\":0},\"6\":{\"i32\":0},\"7\":{\"rec\":{\"1\":{\"lst\":[\"rec\",1,{\"1\":{\"str\":\"name\"},\"2\":{\"str\":\"string\"}}]},\"2\":{\"str\":\"file:/tmp/warehouse/a\"},\"3\":{\"str\":\"org.apache.hadoop.mapred.TextInputFormat\"},\"4\":{\"str\":\"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\"},\"5\":{\"tf\":0},\"6\":{\"i32\":-1},\"7\":{\"rec\":{\"2\":{\"str\":\"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe\"},\"3\":{\"map\":[\"str\",\"str\",2,{\"field.delim\":\"\\n\",\"serialization.format\":\"\\n\"}]}}},\"8\":{\"lst\":[\"str\",0]},\"9\":{\"lst\":[\"rec\",0]},\"10\":{\"map\":[\"str\",\"str\",0,{}]},\"11\":{\"rec\":{\"1\":{\"lst\":[\"str\",0]},\"2\":{\"lst\":[\"lst\",0]},\"3\":{\"map\":[\"lst\",\"str\",0,{}]}}},\"12\":{\"tf\":0}}},\"8\":{\"lst\":[\"rec\",0]},\"9\":{\"map\":[\"str\",\"str\",7,{\"totalSize\":\"0\",\"EXTERNAL\":\"TRUE\",\"numRows\":\"0\",\"rawDataSize\":\"0\",\"COLUMN_STATS_ACCURATE\":\"{\\\"BASIC_STATS\\\":\\\"true\\\"}\",\"numFiles\":\"0\",\"transient_lastDdlTime\":\"1489552350\"}]},\"12\":{\"str\":\"EXTERNAL_TABLE\"},\"13\":{\"rec\":{\"1\":{\"map\":[\"str\",\"lst\",1,{\"anagarwal\":[\"rec\",4,{\"1\":{\"str\":\"INSERT\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"SELECT\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"UPDATE\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"DELETE\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}}]}]}}},\"14\":{\"tf\":0}}",
>   "timestamp": 1489552350,
>   "files": [],
>   "tableObj": {
> "tableName": "a",
> "dbName": "default",
> "owner": "anagarwal",
> "createTime": 1489552350,
> "lastAccessTime": 0,
> "retention": 0,
> "sd": {
>   "cols": [
> {
>   "name": "name",
>   "type": "string",
>   "comment": null,
>   "setName": true,
>   "setType": true,
>   "setComment": false
> }
>   ],
>   "location": "file:/tmp/warehouse/a",
>   "inputFormat": "org.apache.hadoop.mapred.TextInputFormat",
>   "outputFormat": 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
>   "compressed": false,
>   "numBuckets": -1,
>   "serdeInfo": {
> "name": null,
> "serializationLib": 
> "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
> "parameters": {
>   "serialization.format": "\n",
>   "field.delim": "\n"
> },
> "setName": false,
> "parametersSize": 2,
> "setParameters": true,
> "setSerializationLib": true
>   },
>   "bucketCols": [],
>   "sortCols": [],
>   "parameters": {},
>   "skewedInfo": {
> "skewedColNames": [],
> "skewedColValues": [],
> "skewedColValueLocationMaps": {},
> "setSkewedColNames": true,
> "setSkewedColValues": true,
> "setSkewedColValueLocationMaps": true,
> "skewedColNamesSize": 0,
> "skewedColNamesIterator": [],
> "skewedColValuesSize": 0,
> "skewedColValuesIterator": [],
> "skewedColValueLocationMapsSize": 0
>   },
>   "storedAsSubDirectories": false,
>   "setSkewedInfo": true,
>   "parametersSize": 0,
>   "colsSize": 1,
>   "setParameters": true,
>   "setLocation": true,
>   "setInputFormat": true,
>   "setCols": true,
>   "setOutputFormat": true,
>   "setSerdeInfo": true,
>   "setBucketCols": true,
>   "setSortCols": true,
>   "colsIterator": [
> {
>   "name": "name",
>   

[jira] [Commented] (HIVE-16251) Vectorization: new octet_length function (HIVE-15979) get NPE

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934039#comment-15934039
 ] 

Hive QA commented on HIVE-16251:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859676/HIVE-16251.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10480 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4258/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4258/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4258/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859676 - PreCommit-HIVE-Build

> Vectorization: new octet_length function (HIVE-15979) get NPE
> -
>
> Key: HIVE-16251
> URL: https://issues.apache.org/jira/browse/HIVE-16251
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16251.1.patch, HIVE-16251.2.patch
>
>
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.OctetLength.evaluate(OctetLength.java:53)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:125)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:900)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.7.2.jar:?]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
> {code}
> I think it is the use of transient:
> {code}
>   private transient int colNum;
>   private transient int outputColumn;
> {code}
> Transient columns lose their value when serialized from plan to execution.  
> On execution both are 0.  Not sure why tests didn't fail.
> CC: [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16239) remove useless hiveserver

2017-03-20 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16239:
---
Status: Open  (was: Patch Available)

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.1, 2.0.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16239) remove useless hiveserver

2017-03-20 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16239:
---
Status: Patch Available  (was: Open)

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.1, 2.0.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934000#comment-15934000
 ] 

Hive QA commented on HIVE-13567:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859670/HIVE-13567.03.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 54 failed/errored test(s), 10481 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby10] (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby5] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby5_noskew] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby7] (batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby7_map_multi_single_reducer]
 (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby7_noskew] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby7_noskew_multi_single_reducer]
 (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby8] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby8_map] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby8_noskew] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_2] 
(batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_7] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_skew_1_23] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_multi_insert]
 (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input12] (batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input13] (batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input39] (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_7] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[loadpart1] (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_gby4] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_gby] 
(batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_move_tasks_share_dependencies]
 (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_with_join2] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multigroupby_singlemr] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_wide_table] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_coltype_literals]
 (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_multi_insert] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_create_table_serde] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_join_partition_key] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_character_length] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_octet_length] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_round_2] (batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union19] (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union31] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_multi_insert] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf_character_length]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf_octet_length] 
(batchId=2)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_unencrypted_tbl]
 (batchId=161)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[column_names_with_leading_and_trailing_spaces]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[bucket_num_reducers]
 (batchId=85)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_reducers_power_two]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[columnstats_partlvl_invalid_values]
 (batchId=86)

[jira] [Commented] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933997#comment-15933997
 ] 

Deepak Jaiswal commented on HIVE-16260:
---

https://reviews.apache.org/r/57794/

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch, HIVE-16260.2.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933992#comment-15933992
 ] 

Eugene Koifman commented on HIVE-15691:
---

[~kalyanhadoop], sorry if I wasn't clear.  Hive1 patch should have both 
versions of c'tors.  The ones w/o Streaming connection should be deprecated.  
Basically it should mirror the other Streaming Writers.

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Fix For: 1.3.0, 1.2.2, 2.2.0
>
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691-branch-1.2.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933990#comment-15933990
 ] 

Eugene Koifman commented on HIVE-15691:
---

[~vgumashta] I'd like to try to get this into 1.2.2.  Technically this is not a 
bug fix so I don't know what rules we have wrt this

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Fix For: 1.3.0, 1.2.2, 2.2.0
>
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691-branch-1.2.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15691:
--
Fix Version/s: 2.2.0
   1.2.2
   1.3.0

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Fix For: 1.3.0, 1.2.2, 2.2.0
>
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691-branch-1.2.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13517) Hive logs in Spark Executor and Driver should show thread-id.

2017-03-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933988#comment-15933988
 ] 

Sahil Takiar commented on HIVE-13517:
-

[~kellyzly] yes thats correct, but I think the default {{log4j.properties}} 
file in Spark is 
https://github.com/apache/spark/blob/master/conf/log4j.properties.template - 
which doesn't contain %t (thread name) in the layout pattern.

I think the goal of this JIRA is to specify a different {{log4j.properties}} 
for Spark executors such that the thread name is in the layout pattern by 
default. This should make debugging HoS jobs much easier.

[~szehon] is my understanding correct?

> Hive logs in Spark Executor and Driver should show thread-id.
> -
>
> Key: HIVE-13517
> URL: https://issues.apache.org/jira/browse/HIVE-13517
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Szehon Ho
>Assignee: liyunzhang_intel
> Attachments: executor-driver-log.PNG
>
>
> In Spark, there might be more than one task running in one executor. 
> Similarly, there may be more than one thread running in Driver.
> This makes debugging through the logs a nightmare. It would be great if there 
> could be thread-ids in the logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16180) LLAP: Native memory leak in EncodedReader

2017-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16180:

Attachment: HIVE-16180.04.patch

Simplifying the release logic - we can just remember the buffers in the 
beginning. Could be simplified even more by removing all the early release.

> LLAP: Native memory leak in EncodedReader
> -
>
> Key: HIVE-16180
> URL: https://issues.apache.org/jira/browse/HIVE-16180
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: DirectCleaner.java, FullGC-15GB-cleanup.png, 
> Full-gc-native-mem-cleanup.png, HIVE-16180.03.patch, HIVE-16180.04.patch, 
> HIVE-16180.1.patch, HIVE-16180.2.patch, Native-mem-spike.png
>
>
> Observed this in internal test run. There is a native memory leak in Orc 
> EncodedReaderImpl that can cause YARN pmem monitor to kill the container 
> running the daemon. Direct byte buffers are null'ed out which is not 
> guaranteed to be cleaned until next Full GC. To show this issue, attaching a 
> small test program that allocates 3x256MB direct byte buffers. First buffer 
> is null'ed out but still native memory is used. Second buffer user Cleaner to 
> clean up native allocation. Third buffer is also null'ed but this time 
> invoking a System.gc() which cleans up all native memory. Output from the 
> test program is below
> {code}
> Allocating 3x256MB direct memory..
> Native memory used: 786432000
> Native memory used after data1=null: 786432000
> Native memory used after data2.clean(): 524288000
> Native memory used after data3=null: 524288000
> Native memory used without gc: 524288000
> Native memory used after gc: 0
> {code}
> Longer term improvements/solutions:
> 1) Use DirectBufferPool from hadoop or netty's 
> https://netty.io/4.0/api/io/netty/buffer/PooledByteBufAllocator.html as 
> direct byte buffer allocations are expensive (System.gc() + 100ms thread 
> sleep).
> 2) Use HADOOP-12760 for proper cleaner invocation in JDK8 and JDK9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16180) LLAP: Native memory leak in EncodedReader

2017-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-16180:
---

Assignee: Sergey Shelukhin  (was: Prasanth Jayachandran)

> LLAP: Native memory leak in EncodedReader
> -
>
> Key: HIVE-16180
> URL: https://issues.apache.org/jira/browse/HIVE-16180
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: DirectCleaner.java, FullGC-15GB-cleanup.png, 
> Full-gc-native-mem-cleanup.png, HIVE-16180.03.patch, HIVE-16180.1.patch, 
> HIVE-16180.2.patch, Native-mem-spike.png
>
>
> Observed this in internal test run. There is a native memory leak in Orc 
> EncodedReaderImpl that can cause YARN pmem monitor to kill the container 
> running the daemon. Direct byte buffers are null'ed out which is not 
> guaranteed to be cleaned until next Full GC. To show this issue, attaching a 
> small test program that allocates 3x256MB direct byte buffers. First buffer 
> is null'ed out but still native memory is used. Second buffer user Cleaner to 
> clean up native allocation. Third buffer is also null'ed but this time 
> invoking a System.gc() which cleans up all native memory. Output from the 
> test program is below
> {code}
> Allocating 3x256MB direct memory..
> Native memory used: 786432000
> Native memory used after data1=null: 786432000
> Native memory used after data2.clean(): 524288000
> Native memory used after data3=null: 524288000
> Native memory used without gc: 524288000
> Native memory used after gc: 0
> {code}
> Longer term improvements/solutions:
> 1) Use DirectBufferPool from hadoop or netty's 
> https://netty.io/4.0/api/io/netty/buffer/PooledByteBufAllocator.html as 
> direct byte buffer allocations are expensive (System.gc() + 100ms thread 
> sleep).
> 2) Use HADOOP-12760 for proper cleaner invocation in JDK8 and JDK9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16251) Vectorization: new octet_length function (HIVE-15979) get NPE

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933962#comment-15933962
 ] 

Hive QA commented on HIVE-16251:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859676/HIVE-16251.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10480 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=141)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=218)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4256/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4256/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4256/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859676 - PreCommit-HIVE-Build

> Vectorization: new octet_length function (HIVE-15979) get NPE
> -
>
> Key: HIVE-16251
> URL: https://issues.apache.org/jira/browse/HIVE-16251
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16251.1.patch, HIVE-16251.2.patch
>
>
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.OctetLength.evaluate(OctetLength.java:53)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:125)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:900)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.7.2.jar:?]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
> {code}
> I think it is the use of transient:
> {code}
>   private transient int colNum;
>   private transient int outputColumn;
> {code}
> Transient columns lose their value when serialized from plan to execution.  
> On execution both are 0.  Not sure why tests didn't fail.
> CC: [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15912) Executor kill task and Failed to get spark memory/core info

2017-03-20 Thread Yi Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Yao updated HIVE-15912:
--

Thanks for your email. Yi's taking annual leave. Please expect delay of 
responses. Sorry for any inconveniences.

Alibaba support and QAT integration: Xu, Kai A is my backup.
TPCX-BB 100TB: Pls feel free and call my cell.


> Executor kill task and Failed to get spark memory/core info
> ---
>
> Key: HIVE-15912
> URL: https://issues.apache.org/jira/browse/HIVE-15912
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: hadoop2.7.1
> spark2.0.2
> Hive2.2
>Reporter: KaiXu
>
> Hive on Spark, failed with error:
> Starting Spark Job = 12a8cb8c-ed0d-4049-ae06-8d32d13fe285
> Failed to monitor Job[ 6] with exception 'java.lang.IllegalStateException(RPC 
> channel is closed.)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> Hive's log:
> 2017-02-14T19:03:09,147  INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/14 19:03:09 INFO yarn.Client: Application report for 
> application_1486905599813_0403 (state: ACCEPTED)
> 2017-02-14T19:03:10,817  WARN [5bcf13e5-cb54-4cfe-a0d4-9a6556ab48b1 main] 
> spark.SetSparkReducerParallelism: Failed to get spark memory/core info
> java.util.concurrent.TimeoutException
> at 
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) 
> ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:155)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:165)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:77)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:119)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:291)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:120)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11085)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:279)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:510) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1302) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1442) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (HIVE-15912) Executor kill task and Failed to get spark memory/core info

2017-03-20 Thread KaiXu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933961#comment-15933961
 ] 

KaiXu commented on HIVE-15912:
--

Hi [~lirui], I am using Hive2.2 on spark2.0.2, the issue also exists. 

2017-03-21 03:02:30,454 Stage-5_0: 241/241 Finished Stage-6_0: 161(+1)/162  
Stage-7_0: 0/2018   Stage-8_0: 0/1009   Stage-9_0: 0/1009
Failed to monitor Job[4] with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(java.util.concurrent.TimeoutException)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask

in hive's log I also found the TimeoutException, WARN as well as ERROR,

2017-03-21T03:02:31,466  INFO [RPC-Handler-3] rpc.RpcDispatcher: 
[ClientProtocol] Closing channel due to exception in pipeline 
(org.apache.hive.spark.client.SparkClientImpl$ClientProtocol.handle(io.netty.channel.ChannelHandlerContext,
 org.apache.hive.spark.client.rpc.Rpc$MessageHeader)).
2017-03-21T03:02:31,468  WARN [RPC-Handler-3] rpc.RpcDispatcher: 
[ClientProtocol] Expected RPC header, got org.apache.spark.SparkJobInfoImpl 
instead.
2017-03-21T03:02:31,468  INFO [RPC-Handler-3] rpc.RpcDispatcher: 
[ClientProtocol] Closing channel due to exception in pipeline (null).
2017-03-21T03:02:31,469  WARN [RPC-Handler-3] client.SparkClientImpl: Client 
RPC channel closed unexpectedly.
2017-03-21T03:03:31,457  WARN [Thread-349] impl.RemoteSparkJobStatus: Failed to 
get job info.
java.util.concurrent.TimeoutException
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) 
~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkJobInfo(RemoteSparkJobStatus.java:171)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getStageIds(RemoteSparkJobStatus.java:87)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:94)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:84)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:116) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
2017-03-21T03:03:31,457 ERROR [Thread-349] status.SparkJobMonitor: Failed to 
monitor Job[4] with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(java.util.concurrent.TimeoutException)'
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.util.concurrent.TimeoutException
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkJobInfo(RemoteSparkJobStatus.java:174)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getStageIds(RemoteSparkJobStatus.java:87)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:94)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:84)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:116) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
Caused by: java.util.concurrent.TimeoutException
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) 
~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at 

[jira] [Commented] (HIVE-16239) remove useless hiveserver

2017-03-20 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933956#comment-15933956
 ] 

Fei Hui commented on HIVE-16239:


ping [~prasanth_j] [~sershe] [~Ferd]

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.0.1, 2.1.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13517) Hive logs in Spark Executor and Driver should show thread-id.

2017-03-20 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933954#comment-15933954
 ] 

liyunzhang_intel commented on HIVE-13517:
-

[~stakiar]: it is ok to assign it to you.
what i am confused is the log pattern in hive and spark are like following
spark/conf/log4j.properties
{code}
log4j.rootCategory=DEBUG, console$
log4j.appender.console=org.apache.log4j.ConsoleAppender$
log4j.appender.console.target=System.err$
log4j.appender.console.layout=org.apache.log4j.PatternLayout$
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %t 
%c{1}: %m%n$
{code}

hive/conf/hive-log4j.properties
{code}
appender.console.type = Console$
appender.console.name = console$
appender.console.target = SYSTEM_ERR$
appender.console.layout.type = PatternLayout$
appender.console.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n$
{code}

thread-name( %t ) is included in the pattern, so current we can see the 
thread-name in the log. Is my understand right?

> Hive logs in Spark Executor and Driver should show thread-id.
> -
>
> Key: HIVE-13517
> URL: https://issues.apache.org/jira/browse/HIVE-13517
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Szehon Ho
>Assignee: liyunzhang_intel
> Attachments: executor-driver-log.PNG
>
>
> In Spark, there might be more than one task running in one executor. 
> Similarly, there may be more than one thread running in Driver.
> This makes debugging through the logs a nightmare. It would be great if there 
> could be thread-ids in the logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13517) Hive logs in Spark Executor and Driver should show thread-id.

2017-03-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933927#comment-15933927
 ] 

Sahil Takiar commented on HIVE-13517:
-

[~kellyzly] I think I may have an approach to fix this. Are you actively 
working on this, or do you mind if I assign it to myself?

> Hive logs in Spark Executor and Driver should show thread-id.
> -
>
> Key: HIVE-13517
> URL: https://issues.apache.org/jira/browse/HIVE-13517
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Szehon Ho
>Assignee: liyunzhang_intel
> Attachments: executor-driver-log.PNG
>
>
> In Spark, there might be more than one task running in one executor. 
> Similarly, there may be more than one thread running in Driver.
> This makes debugging through the logs a nightmare. It would be great if there 
> could be thread-ids in the logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933895#comment-15933895
 ] 

Deepak Jaiswal commented on HIVE-16260:
---

The last run was clear. Only test failing has age 16

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch, HIVE-16260.2.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933887#comment-15933887
 ] 

Hive QA commented on HIVE-16260:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859665/HIVE-16260.2.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10480 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4255/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4255/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4255/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859665 - PreCommit-HIVE-Build

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch, HIVE-16260.2.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-20 Thread Kalyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kalyan updated HIVE-15691:
--
Attachment: HIVE-15691-branch-1.2.patch
HIVE-15691.3.patch

Hi [~ekoifman], [~roshan_naik]

as per [~ekoifman] comments, removed old construcutors  in hive-2.x

updated new patch (HIVE-15691-branch-1.2.patch) on branch-1.2
updated new patch (HIVE-15691.3.patch) on master

can you please take look once.
Thanks

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691-branch-1.2.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16251) Vectorization: new octet_length function (HIVE-15979) get NPE

2017-03-20 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16251:
--
Attachment: HIVE-16251.2.patch

Added changes in 
ql/src/test/results/clientpositive/vector_udf_octet_length.q.out and 
ql/src/test/results/clientpositive/vector_udf_character_length.q.out, too.

> Vectorization: new octet_length function (HIVE-15979) get NPE
> -
>
> Key: HIVE-16251
> URL: https://issues.apache.org/jira/browse/HIVE-16251
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16251.1.patch, HIVE-16251.2.patch
>
>
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.OctetLength.evaluate(OctetLength.java:53)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:125)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:900)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.7.2.jar:?]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
> {code}
> I think it is the use of transient:
> {code}
>   private transient int colNum;
>   private transient int outputColumn;
> {code}
> Transient columns lose their value when serialized from plan to execution.  
> On execution both are 0.  Not sure why tests didn't fail.
> CC: [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16251) Vectorization: new octet_length function (HIVE-15979) get NPE

2017-03-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933831#comment-15933831
 ] 

Ashutosh Chauhan commented on HIVE-16251:
-

+1

> Vectorization: new octet_length function (HIVE-15979) get NPE
> -
>
> Key: HIVE-16251
> URL: https://issues.apache.org/jira/browse/HIVE-16251
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16251.1.patch
>
>
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.OctetLength.evaluate(OctetLength.java:53)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:125)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:900)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.7.2.jar:?]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
> {code}
> I think it is the use of transient:
> {code}
>   private transient int colNum;
>   private transient int outputColumn;
> {code}
> Transient columns lose their value when serialized from plan to execution.  
> On execution both are 0.  Not sure why tests didn't fail.
> CC: [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16262) Inconsistent result when casting integer to timestamp

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16262:
--


> Inconsistent result when casting integer to timestamp
> -
>
> Key: HIVE-16262
> URL: https://issues.apache.org/jira/browse/HIVE-16262
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> As reported by [~jcamachorodriguez]:
> {code}
> To give a concrete example, consider the following query:
> select cast(0 as timestamp) from src limit 1;
> The result if Hive is running in Santa Clara is:
> 1969-12-31 16:00:00
> While the result if Hive is running in London is:
> 1970-01-01 00:00:00
> This is not the behavior defined by the standard for TIMESTAMP type. The 
> result should be consistent, in this case the correct result is:
> 1970-01-01 00:00:00
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Patch Available  (was: Open)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Attachment: HIVE-13567.03.patch

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Open  (was: Patch Available)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16249) With column stats, mergejoin.q throws NPE

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933808#comment-15933808
 ] 

Hive QA commented on HIVE-16249:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859653/HIVE-16249.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10481 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4254/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4254/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4254/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859653 - PreCommit-HIVE-Build

> With column stats, mergejoin.q throws NPE
> -
>
> Key: HIVE-16249
> URL: https://issues.apache.org/jira/browse/HIVE-16249
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16249.01.patch
>
>
> stack trace:
> {code}
> 2017-03-17T16:00:26,356 ERROR [3d512d4d-72b5-48fc-92cb-0c72f7c876e5 main] 
> parse.CalcitePlanner: CBO failed, skipping CBO.
> java.lang.NullPointerException
> at 
> org.apache.calcite.rel.metadata.RelMdUtil.estimateFilteredRows(RelMdUtil.java:719)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:123)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1866)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addToTop(LoptOptimizeJoinRule.java:1216)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16230) Enable CBO in presence of hints

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933800#comment-15933800
 ] 

Lefty Leverenz commented on HIVE-16230:
---

Should this be documented in the wiki?

* [Cost-based optimization in Hive | 
https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive]

> Enable CBO in presence of hints
> ---
>
> Key: HIVE-16230
> URL: https://issues.apache.org/jira/browse/HIVE-16230
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-16230.1.patch, HIVE-16230.2.patch, 
> HIVE-16230.3.patch, HIVE-16230.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16230) Enable CBO in presence of hints

2017-03-20 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16230:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master.

> Enable CBO in presence of hints
> ---
>
> Key: HIVE-16230
> URL: https://issues.apache.org/jira/browse/HIVE-16230
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-16230.1.patch, HIVE-16230.2.patch, 
> HIVE-16230.3.patch, HIVE-16230.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15166) Provide beeline option to set the jline history max size

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933794#comment-15933794
 ] 

Lefty Leverenz commented on HIVE-15166:
---

Doc note:  This adds the Beeline option *--maxHistoryRows*, which needs to be 
documented in the wiki.

* [HiveServer2 Clients -- Beeline Command Options | 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions]

Added a TODOC2.2 label.

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15166.2.patch, HIVE-15166.3.patch, HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16251) Vectorization: new octet_length function (HIVE-15979) get NPE

2017-03-20 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16251:
--
Status: Patch Available  (was: Open)

> Vectorization: new octet_length function (HIVE-15979) get NPE
> -
>
> Key: HIVE-16251
> URL: https://issues.apache.org/jira/browse/HIVE-16251
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16251.1.patch
>
>
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.OctetLength.evaluate(OctetLength.java:53)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:125)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:900)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.7.2.jar:?]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
> {code}
> I think it is the use of transient:
> {code}
>   private transient int colNum;
>   private transient int outputColumn;
> {code}
> Transient columns lose their value when serialized from plan to execution.  
> On execution both are 0.  Not sure why tests didn't fail.
> CC: [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16251) Vectorization: new octet_length function (HIVE-15979) get NPE

2017-03-20 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16251:
--
Attachment: HIVE-16251.1.patch

> Vectorization: new octet_length function (HIVE-15979) get NPE
> -
>
> Key: HIVE-16251
> URL: https://issues.apache.org/jira/browse/HIVE-16251
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16251.1.patch
>
>
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.OctetLength.evaluate(OctetLength.java:53)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:125)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:900)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.7.2.jar:?]
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.7.2.jar:?]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_91]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
> {code}
> I think it is the use of transient:
> {code}
>   private transient int colNum;
>   private transient int outputColumn;
> {code}
> Transient columns lose their value when serialized from plan to execution.  
> On execution both are 0.  Not sure why tests didn't fail.
> CC: [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15082) Hive-1.2 cannot read data from complex data types with TIMESTAMP column, stored in Parquet

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15082:

Target Version/s: 1.2.2  (was: 1.3.0, 1.2.2)

> Hive-1.2 cannot read data from complex data types with TIMESTAMP column, 
> stored in Parquet
> --
>
> Key: HIVE-15082
> URL: https://issues.apache.org/jira/browse/HIVE-15082
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Blocker
> Attachments: HIVE-15082.1-branch-1.2.patch, 
> HIVE-15082.1-branch-1.2.patch, HIVE-15082-branch-1.2.patch, 
> HIVE-15082-branch-1.patch
>
>
> *STEP 1. Create test data*
> {code:sql}
> select * from dual;
> {code}
> *EXPECTED RESULT:*
> {noformat}
> Pretty_UnIQUe_StrinG
> {noformat}
> {code:sql}
> create table test_parquet1(login timestamp) stored as parquet;
> insert overwrite table test_parquet1 select from_unixtime(unix_timestamp()) 
> from dual;
> select * from test_parquet1 limit 1;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp as result.
> {noformat}
> 2016-10-27 10:58:19
> {noformat}
> *STEP 2. Store timestamp in array in parquet file*
> {code:sql}
> create table test_parquet2(x array) stored as parquet;
> insert overwrite table test_parquet2 select array(login) from test_parquet1;
> select * from test_parquet2;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp in brackets as result.
> {noformat}
> ["2016-10-27 10:58:19"]
> {noformat}
> *ACTUAL RESULT:*
> {noformat}
> ERROR [main]: CliDriver (SessionState.java:printError(963)) - Failed with 
> exception java.io.IOException:parquet.io.ParquetDecodingException: Can not 
> read value at 0 in block -1 in file 
> hdfs:///user/hive/warehouse/test_parquet2/00_0
> java.io.IOException: parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> *ROOT-CAUSE:*
> Incorrect initialization of {{metadata}} {{HashMap}} causes that it has 
> {{null}} value in enumeration 
> {{org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter}} when 
> executing following line:
> {code:java}
>   boolean skipConversion = 
> Boolean.valueOf(metadata.get(HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION.varname));
> {code}
> in element {{ETIMESTAMP_CONVERTER}}.
> JVM throws NPE and parquet library can not read data from file and throws 
> {noformat}
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> for its turn.
> *SOLUTION:*
> Perform initialization in separate method to skip overriding it with {{null}} 
> value in block of code
> {code:java}
>   if (parent != null) {
>  setMetadata(parent.getMetadata());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15082) Hive-1.2 cannot read data from complex data types with TIMESTAMP column, stored in Parquet

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15082:

Target Version/s: 1.3.0  (was: 1.2.2)

> Hive-1.2 cannot read data from complex data types with TIMESTAMP column, 
> stored in Parquet
> --
>
> Key: HIVE-15082
> URL: https://issues.apache.org/jira/browse/HIVE-15082
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Blocker
> Attachments: HIVE-15082.1-branch-1.2.patch, 
> HIVE-15082.1-branch-1.2.patch, HIVE-15082-branch-1.2.patch, 
> HIVE-15082-branch-1.patch
>
>
> *STEP 1. Create test data*
> {code:sql}
> select * from dual;
> {code}
> *EXPECTED RESULT:*
> {noformat}
> Pretty_UnIQUe_StrinG
> {noformat}
> {code:sql}
> create table test_parquet1(login timestamp) stored as parquet;
> insert overwrite table test_parquet1 select from_unixtime(unix_timestamp()) 
> from dual;
> select * from test_parquet1 limit 1;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp as result.
> {noformat}
> 2016-10-27 10:58:19
> {noformat}
> *STEP 2. Store timestamp in array in parquet file*
> {code:sql}
> create table test_parquet2(x array) stored as parquet;
> insert overwrite table test_parquet2 select array(login) from test_parquet1;
> select * from test_parquet2;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp in brackets as result.
> {noformat}
> ["2016-10-27 10:58:19"]
> {noformat}
> *ACTUAL RESULT:*
> {noformat}
> ERROR [main]: CliDriver (SessionState.java:printError(963)) - Failed with 
> exception java.io.IOException:parquet.io.ParquetDecodingException: Can not 
> read value at 0 in block -1 in file 
> hdfs:///user/hive/warehouse/test_parquet2/00_0
> java.io.IOException: parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> *ROOT-CAUSE:*
> Incorrect initialization of {{metadata}} {{HashMap}} causes that it has 
> {{null}} value in enumeration 
> {{org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter}} when 
> executing following line:
> {code:java}
>   boolean skipConversion = 
> Boolean.valueOf(metadata.get(HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION.varname));
> {code}
> in element {{ETIMESTAMP_CONVERTER}}.
> JVM throws NPE and parquet library can not read data from file and throws 
> {noformat}
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> for its turn.
> *SOLUTION:*
> Perform initialization in separate method to skip overriding it with {{null}} 
> value in block of code
> {code:java}
>   if (parent != null) {
>  setMetadata(parent.getMetadata());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15082) Hive-1.2 cannot read data from complex data types with TIMESTAMP column, stored in Parquet

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933792#comment-15933792
 ] 

Vaibhav Gumashta edited comment on HIVE-15082 at 3/20/17 11:31 PM:
---

Removing target 1.2.2 and moving to 1.3.0. Please feel free to revert if you 
think this should go in 1.2.2 (or if this gets reviewed before RC for 1.2.2 is 
cut). Also would be great to determine if branch-2 needs this fix.


was (Author: vgumashta):
Removing target 1.2.2 and moving to 1.3.0. Please feel free to revert if you 
think this should go in 1.2.2 (or if this gets reviewed before RC for 1.2.2 is 
cut).

> Hive-1.2 cannot read data from complex data types with TIMESTAMP column, 
> stored in Parquet
> --
>
> Key: HIVE-15082
> URL: https://issues.apache.org/jira/browse/HIVE-15082
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Blocker
> Attachments: HIVE-15082.1-branch-1.2.patch, 
> HIVE-15082.1-branch-1.2.patch, HIVE-15082-branch-1.2.patch, 
> HIVE-15082-branch-1.patch
>
>
> *STEP 1. Create test data*
> {code:sql}
> select * from dual;
> {code}
> *EXPECTED RESULT:*
> {noformat}
> Pretty_UnIQUe_StrinG
> {noformat}
> {code:sql}
> create table test_parquet1(login timestamp) stored as parquet;
> insert overwrite table test_parquet1 select from_unixtime(unix_timestamp()) 
> from dual;
> select * from test_parquet1 limit 1;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp as result.
> {noformat}
> 2016-10-27 10:58:19
> {noformat}
> *STEP 2. Store timestamp in array in parquet file*
> {code:sql}
> create table test_parquet2(x array) stored as parquet;
> insert overwrite table test_parquet2 select array(login) from test_parquet1;
> select * from test_parquet2;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp in brackets as result.
> {noformat}
> ["2016-10-27 10:58:19"]
> {noformat}
> *ACTUAL RESULT:*
> {noformat}
> ERROR [main]: CliDriver (SessionState.java:printError(963)) - Failed with 
> exception java.io.IOException:parquet.io.ParquetDecodingException: Can not 
> read value at 0 in block -1 in file 
> hdfs:///user/hive/warehouse/test_parquet2/00_0
> java.io.IOException: parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> *ROOT-CAUSE:*
> Incorrect initialization of {{metadata}} {{HashMap}} causes that it has 
> {{null}} value in enumeration 
> {{org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter}} when 
> executing following line:
> {code:java}
>   boolean skipConversion = 
> Boolean.valueOf(metadata.get(HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION.varname));
> {code}
> in element {{ETIMESTAMP_CONVERTER}}.
> JVM throws NPE and parquet library can not read data from file and throws 
> {noformat}
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> for its turn.
> *SOLUTION:*
> Perform initialization in separate method to skip overriding it with {{null}} 
> value in block of code
> {code:java}
>   if (parent != null) {
>  setMetadata(parent.getMetadata());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15082) Hive-1.2 cannot read data from complex data types with TIMESTAMP column, stored in Parquet

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15082:

Target Version/s: 1.3.0, 1.2.2  (was: 1.2.2)

> Hive-1.2 cannot read data from complex data types with TIMESTAMP column, 
> stored in Parquet
> --
>
> Key: HIVE-15082
> URL: https://issues.apache.org/jira/browse/HIVE-15082
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Blocker
> Attachments: HIVE-15082.1-branch-1.2.patch, 
> HIVE-15082.1-branch-1.2.patch, HIVE-15082-branch-1.2.patch, 
> HIVE-15082-branch-1.patch
>
>
> *STEP 1. Create test data*
> {code:sql}
> select * from dual;
> {code}
> *EXPECTED RESULT:*
> {noformat}
> Pretty_UnIQUe_StrinG
> {noformat}
> {code:sql}
> create table test_parquet1(login timestamp) stored as parquet;
> insert overwrite table test_parquet1 select from_unixtime(unix_timestamp()) 
> from dual;
> select * from test_parquet1 limit 1;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp as result.
> {noformat}
> 2016-10-27 10:58:19
> {noformat}
> *STEP 2. Store timestamp in array in parquet file*
> {code:sql}
> create table test_parquet2(x array) stored as parquet;
> insert overwrite table test_parquet2 select array(login) from test_parquet1;
> select * from test_parquet2;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp in brackets as result.
> {noformat}
> ["2016-10-27 10:58:19"]
> {noformat}
> *ACTUAL RESULT:*
> {noformat}
> ERROR [main]: CliDriver (SessionState.java:printError(963)) - Failed with 
> exception java.io.IOException:parquet.io.ParquetDecodingException: Can not 
> read value at 0 in block -1 in file 
> hdfs:///user/hive/warehouse/test_parquet2/00_0
> java.io.IOException: parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> *ROOT-CAUSE:*
> Incorrect initialization of {{metadata}} {{HashMap}} causes that it has 
> {{null}} value in enumeration 
> {{org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter}} when 
> executing following line:
> {code:java}
>   boolean skipConversion = 
> Boolean.valueOf(metadata.get(HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION.varname));
> {code}
> in element {{ETIMESTAMP_CONVERTER}}.
> JVM throws NPE and parquet library can not read data from file and throws 
> {noformat}
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> for its turn.
> *SOLUTION:*
> Perform initialization in separate method to skip overriding it with {{null}} 
> value in block of code
> {code:java}
>   if (parent != null) {
>  setMetadata(parent.getMetadata());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15082) Hive-1.2 cannot read data from complex data types with TIMESTAMP column, stored in Parquet

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933792#comment-15933792
 ] 

Vaibhav Gumashta commented on HIVE-15082:
-

Removing target 1.2.2 and moving to 1.3.0. Please feel free to revert if you 
think this should go in 1.2.2 (or if this gets reviewed before RC for 1.2.2 is 
cut).

> Hive-1.2 cannot read data from complex data types with TIMESTAMP column, 
> stored in Parquet
> --
>
> Key: HIVE-15082
> URL: https://issues.apache.org/jira/browse/HIVE-15082
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Blocker
> Attachments: HIVE-15082.1-branch-1.2.patch, 
> HIVE-15082.1-branch-1.2.patch, HIVE-15082-branch-1.2.patch, 
> HIVE-15082-branch-1.patch
>
>
> *STEP 1. Create test data*
> {code:sql}
> select * from dual;
> {code}
> *EXPECTED RESULT:*
> {noformat}
> Pretty_UnIQUe_StrinG
> {noformat}
> {code:sql}
> create table test_parquet1(login timestamp) stored as parquet;
> insert overwrite table test_parquet1 select from_unixtime(unix_timestamp()) 
> from dual;
> select * from test_parquet1 limit 1;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp as result.
> {noformat}
> 2016-10-27 10:58:19
> {noformat}
> *STEP 2. Store timestamp in array in parquet file*
> {code:sql}
> create table test_parquet2(x array) stored as parquet;
> insert overwrite table test_parquet2 select array(login) from test_parquet1;
> select * from test_parquet2;
> {code}
> *EXPECTED RESULT:*
> No exceptions. Current timestamp in brackets as result.
> {noformat}
> ["2016-10-27 10:58:19"]
> {noformat}
> *ACTUAL RESULT:*
> {noformat}
> ERROR [main]: CliDriver (SessionState.java:printError(963)) - Failed with 
> exception java.io.IOException:parquet.io.ParquetDecodingException: Can not 
> read value at 0 in block -1 in file 
> hdfs:///user/hive/warehouse/test_parquet2/00_0
> java.io.IOException: parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> *ROOT-CAUSE:*
> Incorrect initialization of {{metadata}} {{HashMap}} causes that it has 
> {{null}} value in enumeration 
> {{org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter}} when 
> executing following line:
> {code:java}
>   boolean skipConversion = 
> Boolean.valueOf(metadata.get(HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION.varname));
> {code}
> in element {{ETIMESTAMP_CONVERTER}}.
> JVM throws NPE and parquet library can not read data from file and throws 
> {noformat}
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file hdfs:///user/hive/warehouse/test_parquet2/00_0
> {noformat}
> for its turn.
> *SOLUTION:*
> Perform initialization in separate method to skip overriding it with {{null}} 
> value in block of code
> {code:java}
>   if (parent != null) {
>  setMetadata(parent.getMetadata());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15166) Provide beeline option to set the jline history max size

2017-03-20 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15166:
--
Labels: TODOC2.2  (was: )

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15166.2.patch, HIVE-15166.3.patch, HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15766) DBNotificationlistener leaks JDOPersistenceManager

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15766:

  Resolution: Fixed
Hadoop Flags: Reviewed
Target Version/s:   (was: 2.2.0)
  Status: Resolved  (was: Patch Available)

Committed. Thanks [~mohitsabharwal] [~thejas] [~daijy] for the review.

> DBNotificationlistener leaks JDOPersistenceManager
> --
>
> Key: HIVE-15766
> URL: https://issues.apache.org/jira/browse/HIVE-15766
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.0, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 2.2.0
>
> Attachments: HIVE-15766.1.patch, HIVE-15766.2.patch, 
> HIVE-15766.3.patch, HIVE-15766.4.patch, HIVE-15766.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15947) Enhance Templeton service job operations reliability

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933780#comment-15933780
 ] 

Lefty Leverenz edited comment on HIVE-15947 at 3/20/17 11:23 PM:
-

Doc note:  This adds several WebHCat configuration properties, which need to be 
documented in the wiki.

* templeton.parallellism.job.submit
* templeton.parallellism.job.status
* templeton.parallellism.job.list
* templeton.job.submit.timeout
* templeton.job.status.timeout
* templeton.job.list.timeout
* templeton.job.timeout.task.retry.count
* templeton.job.timeout.task.retry.interval

* [WebHCat Configuration -- Configuration Variables | 
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-ConfigurationVariables]

(To show the version, just copy what's done at the end of the table for 
templeton.hadoop.queue.name & templeton.mapper.memory.mb.)

Added a TODOC2.2 label.


was (Author: le...@hortonworks.com):
Doc note:  This adds several WebHCat configuration properties, which need to be 
documented in the wiki.

* templeton.parallellism.job.submit
* templeton.parallellism.job.status
* templeton.parallellism.job.list
* templeton.job.submit.timeout
* templeton.job.status.timeout
* templeton.job.list.timeout
* templeton.job.timeout.task.retry.count
* templeton.job.timeout.task.retry.interval

* [WebHCat Configuration -- Configuration Variables | 
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-ConfigurationVariables]

(To show the version, just copy what's done at the end of the table for 
templeton.hadoop.queue.name & templeton.mapper.memory.mb.)

> Enhance Templeton service job operations reliability
> 
>
> Key: HIVE-15947
> URL: https://issues.apache.org/jira/browse/HIVE-15947
> Project: Hive
>  Issue Type: Improvement
>Reporter: Subramanyam Pattipaka
>Assignee: Subramanyam Pattipaka
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15947.10.patch, HIVE-15947.2.patch, 
> HIVE-15947.3.patch, HIVE-15947.4.patch, HIVE-15947.6.patch, 
> HIVE-15947.7.patch, HIVE-15947.8.patch, HIVE-15947.9.patch, HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation 
> requests. It simply accepts and tries to run all operations. If more number 
> of concurrent job submit requests comes then the time to submit job 
> operations can increase significantly. Templetonused hdfs to store staging 
> file for job. If HDFS storage can't respond to large number of requests and 
> throttles then the job submission can take very large times in order of 
> minutes.
> This behavior may not be suitable for all applications and client 
> applications  may be looking for predictable and low response for successful 
> request or send throttle response to client to wait for some time before 
> re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of 
> cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs 
> which controls maximum number of concurrent active job submissions within 
> Templeton and use this config to control better response times. If a new job 
> submission request sees that there are already 
> templeton.job.submit.exec.max-procs jobs getting submitted concurrently then 
> the request will fail with Http error 503 with reason 
>“Too many concurrent job submission requests received. Please wait for 
> some time before retrying.”
>  
> The client is expected to catch this response and retry after waiting for 
> some time. The default value for the config 
> templeton.job.submit.exec.max-procs is set to ‘0’. This means by default job 
> submission requests are always accepted. The behavior needs to be enabled 
> based on requirements.
> We can have similar behavior for Status and List operations with configs 
> templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs 
> respectively.
> Once the job operation is started, the operation can take longer time. The 
> client which has requested for job operation may not be waiting for 
> indefinite amount of time. This work introduces configurations
> templeton.exec.job.submit.timeout
> templeton.exec.job.status.timeout
> templeton.exec.job.list.timeout
> to specify maximum amount of time job operation can execute. If time out 
> happens then list and status job requests returns to client with message
> "List job request got timed out. Please retry the operation after waiting for 
> some time."
> If submit job request gets timed out then 
>   i) The job submit request thread which receives 

[jira] [Updated] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16260:
--
Attachment: HIVE-16260.2.patch

Updated results for missed test cases.

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch, HIVE-16260.2.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15947) Enhance Templeton service job operations reliability

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933780#comment-15933780
 ] 

Lefty Leverenz commented on HIVE-15947:
---

Doc note:  This adds several WebHCat configuration properties, which need to be 
documented in the wiki.

* templeton.parallellism.job.submit
* templeton.parallellism.job.status
* templeton.parallellism.job.list
* templeton.job.submit.timeout
* templeton.job.status.timeout
* templeton.job.list.timeout
* templeton.job.timeout.task.retry.count
* templeton.job.timeout.task.retry.interval

* [WebHCat Configuration -- Configuration Variables | 
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-ConfigurationVariables]

(To show the version, just copy what's done at the end of the table for 
templeton.hadoop.queue.name & templeton.mapper.memory.mb.)

> Enhance Templeton service job operations reliability
> 
>
> Key: HIVE-15947
> URL: https://issues.apache.org/jira/browse/HIVE-15947
> Project: Hive
>  Issue Type: Improvement
>Reporter: Subramanyam Pattipaka
>Assignee: Subramanyam Pattipaka
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15947.10.patch, HIVE-15947.2.patch, 
> HIVE-15947.3.patch, HIVE-15947.4.patch, HIVE-15947.6.patch, 
> HIVE-15947.7.patch, HIVE-15947.8.patch, HIVE-15947.9.patch, HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation 
> requests. It simply accepts and tries to run all operations. If more number 
> of concurrent job submit requests comes then the time to submit job 
> operations can increase significantly. Templetonused hdfs to store staging 
> file for job. If HDFS storage can't respond to large number of requests and 
> throttles then the job submission can take very large times in order of 
> minutes.
> This behavior may not be suitable for all applications and client 
> applications  may be looking for predictable and low response for successful 
> request or send throttle response to client to wait for some time before 
> re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of 
> cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs 
> which controls maximum number of concurrent active job submissions within 
> Templeton and use this config to control better response times. If a new job 
> submission request sees that there are already 
> templeton.job.submit.exec.max-procs jobs getting submitted concurrently then 
> the request will fail with Http error 503 with reason 
>“Too many concurrent job submission requests received. Please wait for 
> some time before retrying.”
>  
> The client is expected to catch this response and retry after waiting for 
> some time. The default value for the config 
> templeton.job.submit.exec.max-procs is set to ‘0’. This means by default job 
> submission requests are always accepted. The behavior needs to be enabled 
> based on requirements.
> We can have similar behavior for Status and List operations with configs 
> templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs 
> respectively.
> Once the job operation is started, the operation can take longer time. The 
> client which has requested for job operation may not be waiting for 
> indefinite amount of time. This work introduces configurations
> templeton.exec.job.submit.timeout
> templeton.exec.job.status.timeout
> templeton.exec.job.list.timeout
> to specify maximum amount of time job operation can execute. If time out 
> happens then list and status job requests returns to client with message
> "List job request got timed out. Please retry the operation after waiting for 
> some time."
> If submit job request gets timed out then 
>   i) The job submit request thread which receives time out will check if 
> valid job id is generated in job request.
>   ii) If it is generated then issue kill job request on cancel thread 
> pool. Don't wait for operation to complete and returns to client with time 
> out message. 
> Side effects of enabling time out for submit operations
> 1) This has a possibility for having active job for some time by the client 
> gets response and a list operation from client could potential show the newly 
> created job before it gets killed.
> 2) We do best effort to kill the job and no guarantees. This means there is a 
> possibility of duplicate job created. One possible reason for this could be a 
> case where job is created and then operation timed out but kill request 
> failed due to resource manager unavailability. When resource 

[jira] [Updated] (HIVE-15766) DBNotificationlistener leaks JDOPersistenceManager

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15766:

Fix Version/s: 2.2.0

> DBNotificationlistener leaks JDOPersistenceManager
> --
>
> Key: HIVE-15766
> URL: https://issues.apache.org/jira/browse/HIVE-15766
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.0, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 2.2.0
>
> Attachments: HIVE-15766.1.patch, HIVE-15766.2.patch, 
> HIVE-15766.3.patch, HIVE-15766.4.patch, HIVE-15766.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15947) Enhance Templeton service job operations reliability

2017-03-20 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15947:
--
Labels: TODOC2.2  (was: )

> Enhance Templeton service job operations reliability
> 
>
> Key: HIVE-15947
> URL: https://issues.apache.org/jira/browse/HIVE-15947
> Project: Hive
>  Issue Type: Improvement
>Reporter: Subramanyam Pattipaka
>Assignee: Subramanyam Pattipaka
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15947.10.patch, HIVE-15947.2.patch, 
> HIVE-15947.3.patch, HIVE-15947.4.patch, HIVE-15947.6.patch, 
> HIVE-15947.7.patch, HIVE-15947.8.patch, HIVE-15947.9.patch, HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation 
> requests. It simply accepts and tries to run all operations. If more number 
> of concurrent job submit requests comes then the time to submit job 
> operations can increase significantly. Templetonused hdfs to store staging 
> file for job. If HDFS storage can't respond to large number of requests and 
> throttles then the job submission can take very large times in order of 
> minutes.
> This behavior may not be suitable for all applications and client 
> applications  may be looking for predictable and low response for successful 
> request or send throttle response to client to wait for some time before 
> re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of 
> cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs 
> which controls maximum number of concurrent active job submissions within 
> Templeton and use this config to control better response times. If a new job 
> submission request sees that there are already 
> templeton.job.submit.exec.max-procs jobs getting submitted concurrently then 
> the request will fail with Http error 503 with reason 
>“Too many concurrent job submission requests received. Please wait for 
> some time before retrying.”
>  
> The client is expected to catch this response and retry after waiting for 
> some time. The default value for the config 
> templeton.job.submit.exec.max-procs is set to ‘0’. This means by default job 
> submission requests are always accepted. The behavior needs to be enabled 
> based on requirements.
> We can have similar behavior for Status and List operations with configs 
> templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs 
> respectively.
> Once the job operation is started, the operation can take longer time. The 
> client which has requested for job operation may not be waiting for 
> indefinite amount of time. This work introduces configurations
> templeton.exec.job.submit.timeout
> templeton.exec.job.status.timeout
> templeton.exec.job.list.timeout
> to specify maximum amount of time job operation can execute. If time out 
> happens then list and status job requests returns to client with message
> "List job request got timed out. Please retry the operation after waiting for 
> some time."
> If submit job request gets timed out then 
>   i) The job submit request thread which receives time out will check if 
> valid job id is generated in job request.
>   ii) If it is generated then issue kill job request on cancel thread 
> pool. Don't wait for operation to complete and returns to client with time 
> out message. 
> Side effects of enabling time out for submit operations
> 1) This has a possibility for having active job for some time by the client 
> gets response and a list operation from client could potential show the newly 
> created job before it gets killed.
> 2) We do best effort to kill the job and no guarantees. This means there is a 
> possibility of duplicate job created. One possible reason for this could be a 
> case where job is created and then operation timed out but kill request 
> failed due to resource manager unavailability. When resource manager 
> restarts, it will restarts the job which got created.
> Fixing this scenario is not part of the scope of this JIRA. The job operation 
> functionality can be enabled only if above side effects are acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16247) Branch-1.2: Investigate failure of TestMiniTezCliDriver#tez_smb_empty

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933773#comment-15933773
 ] 

Vaibhav Gumashta commented on HIVE-16247:
-

Branch-1.2 has fix for this via HIVE-11356, but the test still fails with the 
same error as posted on HIVE-11356.

> Branch-1.2: Investigate failure of TestMiniTezCliDriver#tez_smb_empty
> -
>
> Key: HIVE-16247
> URL: https://issues.apache.org/jira/browse/HIVE-16247
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16190) Support expression in merge statement

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933769#comment-15933769
 ] 

Lefty Leverenz commented on HIVE-16190:
---

Doc note:  This should be documented in the Merge section of the DML wikidoc.

* [DML -- Merge | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge]

Added a TODOC2.2 label.

> Support expression in merge statement
> -
>
> Key: HIVE-16190
> URL: https://issues.apache.org/jira/browse/HIVE-16190
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Transactions
>Affects Versions: 2.2.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-16190.01.patch
>
>
> Right now, we only support atomExpression, rather than expression in values 
> in MergeStatement.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16190) Support expression in merge statement

2017-03-20 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16190:
--
Labels: TODOC2.2  (was: )

> Support expression in merge statement
> -
>
> Key: HIVE-16190
> URL: https://issues.apache.org/jira/browse/HIVE-16190
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Transactions
>Affects Versions: 2.2.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-16190.01.patch
>
>
> Right now, we only support atomExpression, rather than expression in values 
> in MergeStatement.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933758#comment-15933758
 ] 

Hive QA commented on HIVE-16260:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859648/HIVE-16260.1.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10480 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_semijoin_reduction2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_semijoin_reduction]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=94)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=94)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4253/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4253/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4253/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859648 - PreCommit-HIVE-Build

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13780) Allow user to update AVRO table schema via command even if table's definition was defined through schema file

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933756#comment-15933756
 ] 

Lefty Leverenz commented on HIVE-13780:
---

Should this be documented in the wiki?  A new configuration parameter created 
in patch 0 is gone in the final patch, but perhaps some instructions should be 
added to the AVRO wikidoc:

* [Avro SerDe | https://cwiki.apache.org/confluence/display/Hive/AvroSerDe]

> Allow user to update AVRO table schema via command even if table's definition 
> was defined through schema file
> -
>
> Key: HIVE-13780
> URL: https://issues.apache.org/jira/browse/HIVE-13780
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 2.0.0
>Reporter: Eric Lin
>Assignee: Adam Szita
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13780.0.patch, HIVE-13780.1.patch, 
> HIVE-13780.3.patch
>
>
> If a table is defined as below:
> {code}
> CREATE TABLE test
> STORED AS AVRO 
> TBLPROPERTIES ('avro.schema.url'='/tmp/schema.json');
> {code}
> if user tries to run command:
> {code}
> ALTER TABLE test CHANGE COLUMN col1 col1 STRING COMMENT 'test comment';
> {code}
> The query will return without any warning, but has no affect to the table.
> It would be good if we can allow user to ALTER table (add/change column, 
> update comment etc) even though the schema is defined through schema file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16248) Branch-1.2: Investigate failure of TestMiniTezCliDriver#bucket_map_join_tez1

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-16248.
-
Resolution: Not A Problem

This is also just a diff in explain output. Closing this.

> Branch-1.2: Investigate failure of TestMiniTezCliDriver#bucket_map_join_tez1
> 
>
> Key: HIVE-16248
> URL: https://issues.apache.org/jira/browse/HIVE-16248
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16257) Intermittent issue with incorrect resultset with Spark

2017-03-20 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933741#comment-15933741
 ] 

Xuefu Zhang commented on HIVE-16257:


[~ngangam], This does sound like a concurrency issue. If you are able to 
reproduce the issue, it would be helpful to get the yarn logs. Ideally, you can 
get logs for one run that doesn't produce the problem. Thanks.

> Intermittent issue with incorrect resultset with Spark
> --
>
> Key: HIVE-16257
> URL: https://issues.apache.org/jira/browse/HIVE-16257
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>
> This issue is highly intermittent that only seems to occurs with spark engine 
> when the query has a GROUPBY clause. The following is the testcase.
> {code}
> drop table if exists test_hos_sample;
> create table test_hos_sample (name string, val1 decimal(18,2), val2 
> decimal(20,3));
> insert into test_hos_sample values 
> ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);
> set hive.execution.engine=spark;
> select  name, val1,val2 from test_hos_sample group by name, val1, val2;
> {code}
> Expected Results:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   101.12  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> Incorrect results once in a while:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   104.52  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> 1) Not reproducible with HoMR.
> 2) Not an issue when running from spark-shell.
> 3) Not reproducible when the column data type is String or double. Only 
> reproducible with decimal data types. Also works fine for decimal datatype if 
> you cast decimal as string on read and cast it back to decimal on select.
> 4) Occurs with parquet and text file format as well. (havent tried with other 
> formats).
> 5) Occurs in both scenarios when table data is within encryption zone and 
> outside.
> 6) Even in clusters where this is reproducible, this occurs once in like 20 
> times or more.
> 7) Occurs with both Beeline and Hive CLI.
> 8) Reproducible only when there is a a groupby clause.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14796) MetastoreEventListener - OnGrant() and OnRevoke() Events required for capturing the event on grant and revoke operation on the table in hive.

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933726#comment-15933726
 ] 

Lefty Leverenz commented on HIVE-14796:
---

[~tanping], you might get a quicker response by writing to d...@hive.apache.org 
or u...@hive.apache.org.

JIRA comments tend to get less attention than direct messages to the mailing 
lists.  Besides, Ashutosh doesn't necessarily speak for the whole community.

You can preface your subject line with \[DISCUSS\] to draw attention to it.  
(Backslashes added for the JIRA UI, don't include them if you're seeing this in 
a mailing list.)

> MetastoreEventListener - OnGrant() and OnRevoke() Events required for 
> capturing the event on grant and revoke operation on the table in hive. 
> --
>
> Key: HIVE-14796
> URL: https://issues.apache.org/jira/browse/HIVE-14796
> Project: Hive
>  Issue Type: New Feature
>  Components: Authorization
>Affects Versions: 1.2.1
> Environment: RHEL6 and RHEL7
>Reporter: Rahul Dhote
>
> During granting and revoking privileges on the table, OnGrant and OnRevoke 
> method is required inside the MetastoreEventListener for capturing the events.
> It would be useful for doing certain operation on this basic authorization 
> events. 
> Ex:
>  /**
>* @param OnGrantEvent grant event
>* @throws MetaException
>*/
>   public void onGrant(GrantEvent grantEvent) throws MetaException {
>   }
>   /**
>* @param OnRevoke revoke event
>* @throws MetaException
>*/
>   public void onRevoke(RevokeEvent revokeEvent) throws MetaException {
>   }



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-20 Thread Sunitha Beeram (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933458#comment-15933458
 ] 

Sunitha Beeram edited comment on HIVE-16206 at 3/20/17 10:26 PM:
-

Test failure unrelated to updated code: 
https://builds.apache.org/job/PreCommit-HIVE-Build/4248/testReport/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_comments_/
The test has failed for the past 9 builds. Failure tracked via HIVE-16256


was (Author: sbeeram):
Test failure unrelated to updated code: 
https://builds.apache.org/job/PreCommit-HIVE-Build/4248/testReport/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_comments_/
The test has failed for the past 9 builds.

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.2.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11554:

Target Version/s: 1.3.0  (was: 1.3.0, 1.2.2)

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
> Fix For: 1.3.0, 2.2.0
>
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11554:

Fix Version/s: 2.2.0
   1.3.0

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
> Fix For: 1.3.0, 2.2.0
>
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14348:

Target Version/s: 1.3.0  (was: 1.3.0, 2.2.0)

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14348:

Target Version/s: 1.3.0, 2.2.0  (was: 1.3.0, 1.2.2, 2.2.0)

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933701#comment-15933701
 ] 

Vaibhav Gumashta commented on HIVE-14348:
-

For 1.2.2 this will also need HIVE-12215 which also has a metastore API change 
and doesn't make sense to backport that to 1.2.2. Removing target 1.2.2

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16249) With column stats, mergejoin.q throws NPE

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16249:
---
Status: Patch Available  (was: Open)

> With column stats, mergejoin.q throws NPE
> -
>
> Key: HIVE-16249
> URL: https://issues.apache.org/jira/browse/HIVE-16249
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16249.01.patch
>
>
> stack trace:
> {code}
> 2017-03-17T16:00:26,356 ERROR [3d512d4d-72b5-48fc-92cb-0c72f7c876e5 main] 
> parse.CalcitePlanner: CBO failed, skipping CBO.
> java.lang.NullPointerException
> at 
> org.apache.calcite.rel.metadata.RelMdUtil.estimateFilteredRows(RelMdUtil.java:719)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:123)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1866)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addToTop(LoptOptimizeJoinRule.java:1216)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16249) With column stats, mergejoin.q throws NPE

2017-03-20 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933692#comment-15933692
 ] 

Pengcheng Xiong commented on HIVE-16249:


[~ashutoshc], could u take a look? Thanks.

> With column stats, mergejoin.q throws NPE
> -
>
> Key: HIVE-16249
> URL: https://issues.apache.org/jira/browse/HIVE-16249
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16249.01.patch
>
>
> stack trace:
> {code}
> 2017-03-17T16:00:26,356 ERROR [3d512d4d-72b5-48fc-92cb-0c72f7c876e5 main] 
> parse.CalcitePlanner: CBO failed, skipping CBO.
> java.lang.NullPointerException
> at 
> org.apache.calcite.rel.metadata.RelMdUtil.estimateFilteredRows(RelMdUtil.java:719)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:123)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1866)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addToTop(LoptOptimizeJoinRule.java:1216)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16249) With column stats, mergejoin.q throws NPE

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16249:
---
Attachment: HIVE-16249.01.patch

> With column stats, mergejoin.q throws NPE
> -
>
> Key: HIVE-16249
> URL: https://issues.apache.org/jira/browse/HIVE-16249
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16249.01.patch
>
>
> stack trace:
> {code}
> 2017-03-17T16:00:26,356 ERROR [3d512d4d-72b5-48fc-92cb-0c72f7c876e5 main] 
> parse.CalcitePlanner: CBO failed, skipping CBO.
> java.lang.NullPointerException
> at 
> org.apache.calcite.rel.metadata.RelMdUtil.estimateFilteredRows(RelMdUtil.java:719)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:123)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) 
> ~[?:?]
> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) 
> ~[?:?]
> at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1866)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.addToTop(LoptOptimizeJoinRule.java:1216)
>  ~[calcite-core-1.10.0.jar:1.10.0]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16207) Add support for Complex Types in Fast SerDe

2017-03-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16207:

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-15468

> Add support for Complex Types in Fast SerDe
> ---
>
> Key: HIVE-16207
> URL: https://issues.apache.org/jira/browse/HIVE-16207
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: partial.patch
>
>
> Add complex type support to Fast SerDe classes.  This is needed for fully 
> supporting complex types in Vectorization



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16209) Vectorization: Add support for complex types to VectorExtractRow and VectorAssignRow

2017-03-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16209:

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-15468

> Vectorization: Add support for complex types to VectorExtractRow and 
> VectorAssignRow
> 
>
> Key: HIVE-16209
> URL: https://issues.apache.org/jira/browse/HIVE-16209
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
>
> Supports complex types in non-native VectorReduceSink, row mode Text 
> Vectorization, and some cases of Vectorized Schema Evolution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16260:
--
Attachment: HIVE-16260.1.patch

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16260:
--
Status: Patch Available  (was: In Progress)

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933647#comment-15933647
 ] 

Lefty Leverenz commented on HIVE-16024:
---

Does this need any documentation in the wiki?  It looks like a simple bug fix, 
but here are the relevant doc links just in case:

* [Recover Partitions (MSCK REPAIR TABLE) | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)]
* [Configuration Properties -- hive.mapred.mode | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapred.mode]

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 2.2.0
>
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch, HIVE-16024.05.patch, 
> HIVE-16024.06.patch, HIVE-16024.07.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933643#comment-15933643
 ] 

Hive QA commented on HIVE-14348:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12820687/HIVE-14348.4.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4252/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4252/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4252/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-03-20 21:46:07.990
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4252/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-03-20 21:46:07.993
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 7ea85a0 HIVE-16227: GenMRFileSink1.java may refer to a wrong MR 
task in multi-insert case (Pengcheng Xiong, reviewed by Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 7ea85a0 HIVE-16227: GenMRFileSink1.java may refer to a wrong MR 
task in multi-insert case (Pengcheng Xiong, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-03-20 21:46:09.085
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java: 
No such file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java: No 
such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java: No such 
file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java:
 No such file or directory
error: a/ql/src/test/results/clientnegative/exchange_partition.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/exchange_partition.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/exchange_partition2.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/exchange_partition3.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/exchgpartition2lel.q.out: No such 
file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12820687 - PreCommit-HIVE-Build

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14348:

Priority: Blocker  (was: Major)

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933639#comment-15933639
 ] 

Hive QA commented on HIVE-16178:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859631/HIVE-16178.2.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10496 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4251/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4251/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4251/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859631 - PreCommit-HIVE-Build

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16178.1.patch, HIVE-16178.2.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16258) Suggesting a non-standard extension to MERGE

2017-03-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16258:
--
Affects Version/s: 2.2.0

> Suggesting a non-standard extension to MERGE
> 
>
> Key: HIVE-16258
> URL: https://issues.apache.org/jira/browse/HIVE-16258
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Carter Shanklin
>
> Some common data maintenance strategies, especially the Type 2 SCD update, 
> would become substantially easier with a small extension to the SQL standard 
> for MERGE, specifically the ability to say "when matched then insert". Per 
> the standard, matched records can only be updated or deleted.
> In the Type 2 SCD, when a new record comes in you update the old version of 
> the record and insert the new version of the same record. If this extension 
> were supported, sample Type 2 SCD code would look as follows:
> {code}
> merge into customer
> using new_customer_stage stage
> on stage.source_pk = customer.source_pk
> when not matched then insert values/* Insert a net new record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null)
> when matched then update set   /* Update an old record to mark it as 
> out-of-date */
>   is_current = false, end_date = current_date()
> when matched then insert values/* Insert a new current record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null);
> {code}
> Without this support, the user needs to devise some sort of workaround. A 
> common approach is to first left join the staging table against the table to 
> be updated, then to join these results to a helper table that will spit out 
> two records for each match and one record for each miss. One of the matching 
> records needs to have a join key that can never occur in the source data so 
> this requires precise knowledge of the source dataset.
> An example of this:
> {code}
> merge into customer
> using (
>   select
> *,
> coalesce(invalid_key, source_pk) as join_key
>   from (
> select
>   stage.source_pk, stage.name, stage.state,
>   case when customer.source_pk is null then 1
>   when stage.name <> customer.name or stage.state <> customer.state then 2
>   else 0 end as scd_row_type
> from
>   new_customer_stage stage
> left join
>   customer
> on (stage.source_pk = customer.source_pk and customer.is_current = true)
>   ) updates
>   join scd_types on scd_types.type = scd_row_type
> ) sub
> on sub.join_key = customer.source_pk
> when matched then update set
>   is_current = false,
>   end_date = current_date()
> when not matched then insert values
>   (sub.source_pk, upper(substr(sub.name, 0, 3)), sub.name, sub.state, true, 
> null);
> select * from customer order by source_pk;
> {code}
> This code is very complicated and will fail if the "invalid" key ever shows 
> up in the source dataset. This simple extension provides a lot of value and 
> likely very little maintenance overhead.
> /cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16258) Suggesting a non-standard extension to MERGE

2017-03-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16258:
--
Component/s: Transactions

> Suggesting a non-standard extension to MERGE
> 
>
> Key: HIVE-16258
> URL: https://issues.apache.org/jira/browse/HIVE-16258
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Carter Shanklin
>
> Some common data maintenance strategies, especially the Type 2 SCD update, 
> would become substantially easier with a small extension to the SQL standard 
> for MERGE, specifically the ability to say "when matched then insert". Per 
> the standard, matched records can only be updated or deleted.
> In the Type 2 SCD, when a new record comes in you update the old version of 
> the record and insert the new version of the same record. If this extension 
> were supported, sample Type 2 SCD code would look as follows:
> {code}
> merge into customer
> using new_customer_stage stage
> on stage.source_pk = customer.source_pk
> when not matched then insert values/* Insert a net new record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null)
> when matched then update set   /* Update an old record to mark it as 
> out-of-date */
>   is_current = false, end_date = current_date()
> when matched then insert values/* Insert a new current record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null);
> {code}
> Without this support, the user needs to devise some sort of workaround. A 
> common approach is to first left join the staging table against the table to 
> be updated, then to join these results to a helper table that will spit out 
> two records for each match and one record for each miss. One of the matching 
> records needs to have a join key that can never occur in the source data so 
> this requires precise knowledge of the source dataset.
> An example of this:
> {code}
> merge into customer
> using (
>   select
> *,
> coalesce(invalid_key, source_pk) as join_key
>   from (
> select
>   stage.source_pk, stage.name, stage.state,
>   case when customer.source_pk is null then 1
>   when stage.name <> customer.name or stage.state <> customer.state then 2
>   else 0 end as scd_row_type
> from
>   new_customer_stage stage
> left join
>   customer
> on (stage.source_pk = customer.source_pk and customer.is_current = true)
>   ) updates
>   join scd_types on scd_types.type = scd_row_type
> ) sub
> on sub.join_key = customer.source_pk
> when matched then update set
>   is_current = false,
>   end_date = current_date()
> when not matched then insert values
>   (sub.source_pk, upper(substr(sub.name, 0, 3)), sub.name, sub.state, true, 
> null);
> select * from customer order by source_pk;
> {code}
> This code is very complicated and will fail if the "invalid" key ever shows 
> up in the source dataset. This simple extension provides a lot of value and 
> likely very little maintenance overhead.
> /cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14348:

Target Version/s: 1.3.0, 1.2.2, 2.2.0

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-11554.
-
Resolution: Duplicate

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14348:

Affects Version/s: (was: 2.1.0)
   2.1.1

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16252) Vectorization: Cannot vectorize: Aggregation Function UDF avg

2017-03-20 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933581#comment-15933581
 ] 

Zoltan Haindrich commented on HIVE-16252:
-

I think this is by-design...the avg udaf uses an internal temporary type to 
communicate after partial1 
{{struct}} is that 
format...since it would be tricky to process this format in vectorized mode - 
it leaves it to the standard udaf to do this work; however the message is a bit 
misleading; it tries to apply it to an aggregate for FINAL which doesn't seem 
right.

[~rajesh.balamohan] did it cause any trouble?

> Vectorization: Cannot vectorize: Aggregation Function UDF avg 
> --
>
> Key: HIVE-16252
> URL: https://issues.apache.org/jira/browse/HIVE-16252
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>
> {noformat}
> select 
> ss_store_sk, ss_item_sk, avg(ss_sales_price) as revenue
> from
> store_sales, date_dim
> where
> ss_sold_date_sk = d_date_sk
> and d_month_seq between 1212 and 1212 + 11
> group by ss_store_sk , ss_item_sk limit 100;
> 2017-03-20T00:59:49,526  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Validating ReduceWork...
> 2017-03-20T00:59:49,526 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Using reduce tag 0
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> lazybinary.LazyBinarySerDe: LazyBinarySerDe initialized with: 
> columnNames=[_col0] columnTypes=[struct]
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> vector.VectorizationContext: Input Expression = Column[KEY._col0], Vectorized 
> Expression = col 0
> ...
> ...
> 2017-03-20T00:59:49,528  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Cannot vectorize: Aggregation Function UDF avg parameter 
> expression for GROUPBY operator: Data type 
> struct of Column[VALUE._col0] not 
> supported
> {noformat}
> Env: Hive build from: commit 71f4930d95475e7e63b5acc55af3809aefcc71e0 (march 
> 16)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-20 Thread Misha Dmitriev (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933583#comment-15933583
 ] 

Misha Dmitriev commented on HIVE-16166:
---

I ran 'mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_if_expr.q' 
locally, and it passed.

I then checked the hive log at 
http://104.198.109.242/logs/PreCommit-HIVE-Build-4250/failed/141-TestMiniLlapLocalCliDriver-skewjoinopt15.q-vector_coalesce.q-orc_ppd_decimal.q-and-27-more/logs/hive.log
 It does have a bunch of exception stack traces, but it doesn't look like they 
are related with my changes. At least I don't see 'StringInternUtils' (my class 
where an NPE or some such is most likely to happen), and a bunch of NPEs all 
across this log are all of the same type and have no traces of the code that 
I've modified. I can't see where in this log the problematic test 
(vector_if_expr) starts, or do all the tests run in parallel?

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch, 
> HIVE-16166.02.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933568#comment-15933568
 ] 

Vaibhav Gumashta edited comment on HIVE-11554 at 3/20/17 9:09 PM:
--

Modifying target version to add 1.3.0 since this is not a regression and 1.2.2 
is a minor version update. If i'm able to put commit this before 1.2.2 RC is 
cut, I'll commit this to 1.2 branch


was (Author: vgumashta):
Modifying target version to 1.3.0 since this is not a regression and 1.2.2 is a 
minor version update. If i'm able to put commit this before 1.2.2 RC is cut, 
I'll commit this to 1.2 branch

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933568#comment-15933568
 ] 

Vaibhav Gumashta commented on HIVE-11554:
-

Modifying target version to 1.3.0 since this is not a regression and 1.2.2 is a 
minor version update. If i'm able to put commit this before 1.2.2 RC is cut, 
I'll commit this to 1.2 branch

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >