[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515494#comment-15515494
 ] 

Lefty Leverenz commented on HIVE-14801:
---

[~thejas], you committed this to master so it needs a status update.

(Commit 0c392b185d98b4fb380a33a535b5f528625a47e8.)

> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14580) Introduce || operator

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515271#comment-15515271
 ] 

Hive QA commented on HIVE-14580:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829933/HIVE-14580.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1283/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1283/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1283/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829933 - PreCommit-HIVE-Build

> Introduce || operator
> -
>
> Key: HIVE-14580
> URL: https://issues.apache.org/jira/browse/HIVE-14580
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14580.1.patch
>
>
> Functionally equivalent to concat() udf. But standard allows usage of || for 
> string concatenations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515196#comment-15515196
 ] 

Ferdinand Xu commented on HIVE-5867:


Hi [~JonnyR], can you attach file as an attachment in this ticket to trigger 
the Jenkins job?

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14825) Figure out the minimum set of required jars for Hive on Spark after bumping up to Spark 2.0.0

2016-09-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515155#comment-15515155
 ] 

Rui Li commented on HIVE-14825:
---

Thanks [~Ferd] for tracking this. I expect the minimum set to be fairly small :)

> Figure out the minimum set of required jars for Hive on Spark after bumping 
> up to Spark 2.0.0
> -
>
> Key: HIVE-14825
> URL: https://issues.apache.org/jira/browse/HIVE-14825
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>
> Considering that there's no assembly jar for Spark since 2.0.0, we should 
> figure out the minimum set of required jars for HoS to work after bumping up 
> to Spark 2.0.0. By this way, users can decide whether they want to add just 
> the required jars, or all the jars under spark's dir for convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14820) RPC server for spark inside HS2 is not getting server address properly

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515139#comment-15515139
 ] 

Hive QA commented on HIVE-14820:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829923/HIVE-14820.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.hcatalog.mapreduce.TestHCatMultiOutputFormat.testOutputFormat
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1282/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1282/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1282/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829923 - PreCommit-HIVE-Build

> RPC server for spark inside HS2 is not getting server address properly
> --
>
> Key: HIVE-14820
> URL: https://issues.apache.org/jira/browse/HIVE-14820
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14820.1.patch
>
>
> When hive.spark.client.rpc.server.address is configured, this property is not 
> retrieved properly because we are getting the value by {{String hiveHost = 
> config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS);}}  which always 
> returns null in getServerAddress() call of RpcConfiguration.java. Rather it 
> should be {{String hiveHost = 
> config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS.varname);}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515131#comment-15515131
 ] 

Ferdinand Xu commented on HIVE-14029:
-

Hi [~lirui], [~xuefuz], HIVE-14825 was created addressing this.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0

2016-09-22 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14029:

Attachment: HIVE-14029.5.patch

Hi [~spena], it's weird why Jenkins can build it successfully. Hi [~lirui] I 
exclude the {code}javax.ws.rs{code} imported by spark-core in 5th patch.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14818) Reduce number of retries while starting HiveServer for tests

2016-09-22 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14818:
--
Attachment: HIVE-14818.02.patch

Updated patch to fix the enum reference order. Agree with 30m being too much - 
don't know why any restarts are attempted, but I don't plan to change that here.

> Reduce number of retries while starting HiveServer for tests
> 
>
> Key: HIVE-14818
> URL: https://issues.apache.org/jira/browse/HIVE-14818
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14818.01.patch, HIVE-14818.02.patch
>
>
> Current is 30 retries, with a 1minute sleep between each one.
> The settings are likely bad for a production cluster as well. For tests, this 
> should be a lot lower.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14818) Reduce number of retries while starting HiveServer for tests

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515020#comment-15515020
 ] 

Hive QA commented on HIVE-14818:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829919/HIVE-14818.01.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1281/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1281/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1281/

Messages:
{noformat}
 This message was trimmed, see log for full details 

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/storage-api/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/storage-api/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/storage-api/target/tmp/conf
 [copy] Copying 15 files to 
/data/hive-ptest/working/apache-github-source-source/storage-api/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-storage-api ---
[INFO] Compiling 7 source files to 
/data/hive-ptest/working/apache-github-source-source/storage-api/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ hive-storage-api 
---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ hive-storage-api ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/storage-api/target/hive-storage-api-2.2.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-storage-api ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ 
hive-storage-api ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/storage-api/target/hive-storage-api-2.2.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-storage-api/2.2.0-SNAPSHOT/hive-storage-api-2.2.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/storage-api/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/hive-storage-api/2.2.0-SNAPSHOT/hive-storage-api-2.2.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive ORC 2.2.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-orc ---
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/orc/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/orc 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-orc ---
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-orc ---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/orc/src/gen/protobuf-java 
added.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-orc ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-orc 
---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/orc/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-orc ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-orc ---
[INFO] Compiling 71 source files to 
/data/hive-ptest/working/apache-github-source-source/orc/target/classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/tools/FileDump.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/tools/FileDump.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/impl/RecordReaderImpl.java:
 
/data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/impl/RecordReaderImpl.java
 uses unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/impl/RecordReaderImpl.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-orc ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 7 resources
[INFO] Copying 3 resources
[INFO] 

[jira] [Commented] (HIVE-14819) FunctionInfo for permanent functions shows TEMPORARY FunctionType

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515017#comment-15515017
 ] 

Hive QA commented on HIVE-14819:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829917/HIVE-14819.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10558 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1280/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1280/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1280/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829917 - PreCommit-HIVE-Build

> FunctionInfo for permanent functions shows TEMPORARY FunctionType
> -
>
> Key: HIVE-14819
> URL: https://issues.apache.org/jira/browse/HIVE-14819
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.0
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14819.1.patch
>
>
> The FunctionInfo has a FunctionType field which describes if the function is 
> a builtin/persistent/temporary function. But for permanent functions, the 
> FunctionInfo being returned by the FunctionRegistry is showing the type to be 
> TEMPORARY.
> This affects things which may be depending on function type, for example 
> LlapDecider, which will allow builtin/persistent UDFs to be used in LLAP but 
> not temporary functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14824) Separate fstype from cluster type in QTestUtil

2016-09-22 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14824:
--
Attachment: HIVE-14824.01.patch

[~prasanth_j] - please review.

After this, to run TestEncHdfsDriver on tez, the following change is adequate 
in CliConfigs. Similarly for llap, spark etc.

{code}
-setHiveConfDir("data/conf");
-setClusterType(MiniClusterType.mr);
+setHiveConfDir("data/conf/tez");
+setClusterType(MiniClusterType.tez);
{code}

> Separate fstype from cluster type in QTestUtil
> --
>
> Key: HIVE-14824
> URL: https://issues.apache.org/jira/browse/HIVE-14824
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14824.01.patch
>
>
> QTestUtil cluster type encodes the file system. e.g. 
> MiniClusterType.encrypted means mr + encrypted hdfs, spark means file://, mr 
> means hdfs etc.
> These can be separated out. e.g. To add tests for tez against encrypted, and 
> llap against encrypted - I'd need to introduce 2 new cluster types.
> Instead it's better to separate the storage into it's own types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14824) Separate fstype from cluster type in QTestUtil

2016-09-22 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14824:
--
Status: Patch Available  (was: Open)

> Separate fstype from cluster type in QTestUtil
> --
>
> Key: HIVE-14824
> URL: https://issues.apache.org/jira/browse/HIVE-14824
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14824.01.patch
>
>
> QTestUtil cluster type encodes the file system. e.g. 
> MiniClusterType.encrypted means mr + encrypted hdfs, spark means file://, mr 
> means hdfs etc.
> These can be separated out. e.g. To add tests for tez against encrypted, and 
> llap against encrypted - I'd need to introduce 2 new cluster types.
> Instead it's better to separate the storage into it's own types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14823) ZooKeeperHiveLockManager logs WAY too much on INFO level

2016-09-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-14823.
-
Resolution: Duplicate

Nm, dup of HIVE-12966

> ZooKeeperHiveLockManager logs WAY too much on INFO level
> 
>
> Key: HIVE-14823
> URL: https://issues.apache.org/jira/browse/HIVE-14823
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> about to release lock  ... can be logged 1 times for large tables. Should 
> be DEBUG or even TRACE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14823) ZooKeeperHiveLockManager logs WAY too much on INFO level

2016-09-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-14823:
---

Assignee: Sergey Shelukhin

> ZooKeeperHiveLockManager logs WAY too much on INFO level
> 
>
> Key: HIVE-14823
> URL: https://issues.apache.org/jira/browse/HIVE-14823
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> about to release lock  ... can be logged 1 times for large tables. Should 
> be DEBUG or even TRACE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width

2016-09-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-13098:
---

Assignee: Sergey Shelukhin

> Add a strict check for when the decimal gets converted to null due to 
> insufficient width
> 
>
> Key: HIVE-13098
> URL: https://issues.apache.org/jira/browse/HIVE-13098
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> When e.g. 99 is selected as decimal(5,0), the result is null. This can be 
> problematic, esp. if the data is written to a table and lost without the user 
> realizing it. There should be an option to error out in such cases instead; 
> it should probably be on by default and the error message should instruct the 
> user on how to disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14426) Extensive logging on info level in WebHCat

2016-09-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514982#comment-15514982
 ] 

Eugene Koifman commented on HIVE-14426:
---

FYI, WebHCat doesn't really have any JUnit tests, so these changes (at least 
the WebHCat part) are not being tested by the build bot.  Most WebHCat tests 
are under hcatalog/src/test/e2e/templeton/ and require a running Hadoop 
instance.

> Extensive logging on info level in WebHCat
> --
>
> Key: HIVE-14426
> URL: https://issues.apache.org/jira/browse/HIVE-14426
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14426.2.patch, HIVE-14426.3.patch, 
> HIVE-14426.4.patch, HIVE-14426.5.patch, HIVE-14426.6.patch, 
> HIVE-14426.7.patch, HIVE-14426.8.patch, HIVE-14426.9-branch-2.1.patch, 
> HIVE-14426.9.patch, HIVE-14426.patch
>
>
> There is an extensive logging in WebHCat at info level, and even some 
> sensitive information could be logged



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14821) build q test

2016-09-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14821:
--
Attachment: HIVE-14821.2.patch

> build q test
> 
>
> Key: HIVE-14821
> URL: https://issues.apache.org/jira/browse/HIVE-14821
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14821.1.patch, HIVE-14821.2.patch, HIVE-14821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-09-22 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514904#comment-15514904
 ] 

Zhiyuan Yang commented on HIVE-14731:
-

Test failures are irrelevant.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.2.patch, 
> HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, 
> HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514895#comment-15514895
 ] 

Hive QA commented on HIVE-14731:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829909/HIVE-14731.8.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10522 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapCliDriver-tez_schema_evolution.q-tez_join.q-file_with_header_footer.q-and-27-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testDelegationTokenSharedStore
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1279/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1279/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1279/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829909 - PreCommit-HIVE-Build

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.2.patch, 
> HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, 
> HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-22 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514737#comment-15514737
 ] 

Szehon Ho commented on HIVE-14713:
--

I think there is a 24 hour wait after the last +1 to get merged (at least last 
time I checked).  Feel free to ping again if it is forgotten.

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, 
> HIVE-14713.3.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14821) build q test

2016-09-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14821:
--
Attachment: HIVE-14821.1.patch

> build q test
> 
>
> Key: HIVE-14821
> URL: https://issues.apache.org/jira/browse/HIVE-14821
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14821.1.patch, HIVE-14821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14817) Shutdown the SessionManager timeoutChecker thread properly upon shutdown

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514671#comment-15514671
 ] 

Hive QA commented on HIVE-14817:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829914/HIVE-14817.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10555 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1278/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1278/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1278/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829914 - PreCommit-HIVE-Build

> Shutdown the SessionManager timeoutChecker thread properly upon shutdown
> 
>
> Key: HIVE-14817
> URL: https://issues.apache.org/jira/browse/HIVE-14817
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14817.01.patch
>
>
> Shutdown for SessionManager waits 10seconds for all threads on the 
> threadpoolExecutor to shutdown correctly.
> The cleaner thread - with default settings - will take 6 hours to shutdown, 
> so essentially any shutdown of HS2 is always delayed by 10s.
> The cleaner thread should be shutdown properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14821) build q test

2016-09-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14821:
--
Status: Patch Available  (was: Open)

> build q test
> 
>
> Key: HIVE-14821
> URL: https://issues.apache.org/jira/browse/HIVE-14821
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-22 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514582#comment-15514582
 ] 

Illya Yalovyy commented on HIVE-14713:
--

[~szehon], [~ctang.ma],
The CR got a "ship it", please advise what is the next step to get this patch 
accepted?



> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, 
> HIVE-14713.3.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14821) build q test

2016-09-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14821:
--
Attachment: HIVE-14821.patch

> build q test
> 
>
> Key: HIVE-14821
> URL: https://issues.apache.org/jira/browse/HIVE-14821
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514563#comment-15514563
 ] 

Aihua Xu commented on HIVE-1:
-

Thanks Xuefu for reviewing.

1. Currently I didn't try to handle space in the string. If it's not configured 
properly, it will fall back to 0 (which is random port). Do you think we 
should? Seems we are strict when we handle the entry in the hive config.
2. You are right. I thought we should do that but I forgot to add that logic 
when implementing it. I will add that.

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14580) Introduce || operator

2016-09-22 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-14580:

Attachment: HIVE-14580.1.patch

I didn't wanted to "really" introduce a full new operator - and possibly open 
bug possibilities because of concat/|| implementation differences; so I looked 
into creating an alias for the concat() udf; which already has optimization & 
vectorization support.

my options were:

* purely antlr based - I failed with this approach
* minor antlr change + ast rewrite - choosen path

for the rewrite i've seen a few places where I can add this...but only 
{{SemanticAnalyzer.processPositionAlias}} looked promising - there are other 
places but I think {{TypeCheckProcFactory}} would be a bit late..and adding 
this anywhere to optimization related rewrites would be inappropriate because 
this is not an optimization...

I've done a minor refactor and splitted {{processPositionAlias}} from it's walk 
logic - which I'm using to dispatch the concatenate rewrites too.

[~pxiong] what do you think about it?

> Introduce || operator
> -
>
> Key: HIVE-14580
> URL: https://issues.apache.org/jira/browse/HIVE-14580
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14580.1.patch
>
>
> Functionally equivalent to concat() udf. But standard allows usage of || for 
> string concatenations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14580) Introduce || operator

2016-09-22 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-14580:

Status: Patch Available  (was: Open)

> Introduce || operator
> -
>
> Key: HIVE-14580
> URL: https://issues.apache.org/jira/browse/HIVE-14580
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14580.1.patch
>
>
> Functionally equivalent to concat() udf. But standard allows usage of || for 
> string concatenations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514540#comment-15514540
 ] 

Sergio Peña commented on HIVE-14373:


Thanks [~poeppt]. I will take a look on the patch tomorrow or early next week.

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514528#comment-15514528
 ] 

Hive QA commented on HIVE-1:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829905/HIVE-1.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.rpc.TestRpc.testServerPort
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1277/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1277/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1277/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829905 - PreCommit-HIVE-Build

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security 

[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514485#comment-15514485
 ] 

Xuefu Zhang commented on HIVE-1:


HI [~aihuaxu], thanks for working on this. The patch looks good. I have two 
minor questions:
1. Do we have a strict syntax requirement on the format of the new property 
value? For instance, what happens if there is space around ',' or '-'.
2. What happens if the randomly selected port is not available? Should we retry 
until we get a good one?


> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13903) getFunctionInfo is downloading jar on every call

2016-09-22 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514470#comment-15514470
 ] 

Jason Dere commented on HIVE-13903:
---

Hi [~prongs], just trying to get a little background on this one - was the JAR 
being downloaded once per session, or was it getting downloaded every time the 
UDF was being used, even in the same session?

> getFunctionInfo is downloading jar on every call
> 
>
> Key: HIVE-13903
> URL: https://issues.apache.org/jira/browse/HIVE-13903
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Fix For: 2.1.0
>
> Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, 
> HIVE-13903.02.patch
>
>
> on queries using permanent udfs, the jar file of the udf is downloaded 
> multiple times. Each call originating from Registry.getFunctionInfo. This 
> increases time for the query, especially if that query is just an explain 
> query. The jar should be downloaded once, and not downloaded again if the udf 
> class is accessible in the current thread. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14820) RPC server for spark inside HS2 is not getting server address properly

2016-09-22 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514429#comment-15514429
 ] 

Yongzhi Chen commented on HIVE-14820:
-

Simple change, LGTM   +1

> RPC server for spark inside HS2 is not getting server address properly
> --
>
> Key: HIVE-14820
> URL: https://issues.apache.org/jira/browse/HIVE-14820
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14820.1.patch
>
>
> When hive.spark.client.rpc.server.address is configured, this property is not 
> retrieved properly because we are getting the value by {{String hiveHost = 
> config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS);}}  which always 
> returns null in getServerAddress() call of RpcConfiguration.java. Rather it 
> should be {{String hiveHost = 
> config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS.varname);}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514387#comment-15514387
 ] 

Thomas Poepping commented on HIVE-14373:


It wouldn't make sense for these tests to be related, as I am touching almost 
no existing code. Can I get eyes on this?

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14751) Add support for date truncation

2016-09-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514382#comment-15514382
 ] 

Ashutosh Chauhan commented on HIVE-14751:
-

LGTM +1
Question: This currently only supports timestamp argument. Shall it also 
support date and interval? We can add that support later, just want to make 
sure its not something you missed.

> Add support for date truncation
> ---
>
> Key: HIVE-14751
> URL: https://issues.apache.org/jira/browse/HIVE-14751
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14751.patch
>
>
> Add support for {{floor ( to )}}, which is equivalent to 
> {{date_trunc(, )}}.
> https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14579) Add support for date extract

2016-09-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514352#comment-15514352
 ] 

Ashutosh Chauhan commented on HIVE-14579:
-

clever trick of rewriting in parser 
+1 

> Add support for date extract
> 
>
> Key: HIVE-14579
> URL: https://issues.apache.org/jira/browse/HIVE-14579
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Ashutosh Chauhan
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14579.01.patch, HIVE-14579.patch, HIVE-14579.patch
>
>
> https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514355#comment-15514355
 ] 

Hive QA commented on HIVE-14373:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829898/HIVE-14373.05.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10555 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1276/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1276/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1276/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829898 - PreCommit-HIVE-Build

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-22 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514347#comment-15514347
 ] 

Chaoyu Tang commented on HIVE-14713:


LGTM, +1


> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, 
> HIVE-14713.3.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14820) RPC server for spark inside HS2 is not getting server address properly

2016-09-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14820:

Status: Patch Available  (was: Open)

patch-1: Change to use .varname to be the key to the map. Otherwise, 
get(Object) will return null always.

> RPC server for spark inside HS2 is not getting server address properly
> --
>
> Key: HIVE-14820
> URL: https://issues.apache.org/jira/browse/HIVE-14820
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14820.1.patch
>
>
> When hive.spark.client.rpc.server.address is configured, this property is not 
> retrieved properly because we are getting the value by {{String hiveHost = 
> config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS);}}  which always 
> returns null in getServerAddress() call of RpcConfiguration.java. Rather it 
> should be {{String hiveHost = 
> config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS.varname);}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14820) RPC server for spark inside HS2 is not getting server address properly

2016-09-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14820:

Attachment: HIVE-14820.1.patch

> RPC server for spark inside HS2 is not getting server address properly
> --
>
> Key: HIVE-14820
> URL: https://issues.apache.org/jira/browse/HIVE-14820
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14820.1.patch
>
>
> When hive.spark.client.rpc.server.address is configured, this property is not 
> retrieved properly because we are getting the value by {{String hiveHost = 
> config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS);}}  which always 
> returns null in getServerAddress() call of RpcConfiguration.java. Rather it 
> should be {{String hiveHost = 
> config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS.varname);}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Component/s: Spark

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514263#comment-15514263
 ] 

Lefty Leverenz commented on HIVE-14793:
---

Okay, thanks.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.2.0
>
> Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch, 
> HIVE-14793.03.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14818) Reduce number of retries while starting HiveServer for tests

2016-09-22 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514259#comment-15514259
 ] 

Prasanth Jayachandran commented on HIVE-14818:
--

Not relevant to this issue. But default 30 min sleep time feels a lot to me.

Other than that patch lgtm +1

> Reduce number of retries while starting HiveServer for tests
> 
>
> Key: HIVE-14818
> URL: https://issues.apache.org/jira/browse/HIVE-14818
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14818.01.patch
>
>
> Current is 30 retries, with a 1minute sleep between each one.
> The settings are likely bad for a production cluster as well. For tests, this 
> should be a lot lower.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14818) Reduce number of retries while starting HiveServer for tests

2016-09-22 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14818:
--
Attachment: HIVE-14818.01.patch

[~thejas], [~prasanth_j] - please review.

> Reduce number of retries while starting HiveServer for tests
> 
>
> Key: HIVE-14818
> URL: https://issues.apache.org/jira/browse/HIVE-14818
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14818.01.patch
>
>
> Current is 30 retries, with a 1minute sleep between each one.
> The settings are likely bad for a production cluster as well. For tests, this 
> should be a lot lower.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14818) Reduce number of retries while starting HiveServer for tests

2016-09-22 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14818:
--
Status: Patch Available  (was: Open)

> Reduce number of retries while starting HiveServer for tests
> 
>
> Key: HIVE-14818
> URL: https://issues.apache.org/jira/browse/HIVE-14818
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14818.01.patch
>
>
> Current is 30 retries, with a 1minute sleep between each one.
> The settings are likely bad for a production cluster as well. For tests, this 
> should be a lot lower.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14819) FunctionInfo for permanent functions shows TEMPORARY FunctionType

2016-09-22 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-14819:
--
Status: Patch Available  (was: Open)

> FunctionInfo for permanent functions shows TEMPORARY FunctionType
> -
>
> Key: HIVE-14819
> URL: https://issues.apache.org/jira/browse/HIVE-14819
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.0
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14819.1.patch
>
>
> The FunctionInfo has a FunctionType field which describes if the function is 
> a builtin/persistent/temporary function. But for permanent functions, the 
> FunctionInfo being returned by the FunctionRegistry is showing the type to be 
> TEMPORARY.
> This affects things which may be depending on function type, for example 
> LlapDecider, which will allow builtin/persistent UDFs to be used in LLAP but 
> not temporary functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14819) FunctionInfo for permanent functions shows TEMPORARY FunctionType

2016-09-22 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514230#comment-15514230
 ] 

Jason Dere commented on HIVE-14819:
---

Patch to allow the registry to set PERSISTENT type when registering permanent 
functions to the session registry. Previously all functions added to session 
registry had the TEMPORARY tag.

> FunctionInfo for permanent functions shows TEMPORARY FunctionType
> -
>
> Key: HIVE-14819
> URL: https://issues.apache.org/jira/browse/HIVE-14819
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.0
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14819.1.patch
>
>
> The FunctionInfo has a FunctionType field which describes if the function is 
> a builtin/persistent/temporary function. But for permanent functions, the 
> FunctionInfo being returned by the FunctionRegistry is showing the type to be 
> TEMPORARY.
> This affects things which may be depending on function type, for example 
> LlapDecider, which will allow builtin/persistent UDFs to be used in LLAP but 
> not temporary functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14819) FunctionInfo for permanent functions shows TEMPORARY FunctionType

2016-09-22 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-14819:
--
Attachment: HIVE-14819.1.patch

> FunctionInfo for permanent functions shows TEMPORARY FunctionType
> -
>
> Key: HIVE-14819
> URL: https://issues.apache.org/jira/browse/HIVE-14819
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.1.0
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14819.1.patch
>
>
> The FunctionInfo has a FunctionType field which describes if the function is 
> a builtin/persistent/temporary function. But for permanent functions, the 
> FunctionInfo being returned by the FunctionRegistry is showing the type to be 
> TEMPORARY.
> This affects things which may be depending on function type, for example 
> LlapDecider, which will allow builtin/persistent UDFs to be used in LLAP but 
> not temporary functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code

2016-09-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514219#comment-15514219
 ] 

Lefty Leverenz commented on HIVE-9423:
--

+1 for the new error messages (patch 4)

> HiveServer2: Provide the user with different error messages depending on the 
> Thrift client exception code
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, 
> HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14683) ptest uses invalid killall command

2016-09-22 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-14683.
---
Resolution: Duplicate
  Assignee: (was: Siddharth Seth)

> ptest uses invalid killall command
> --
>
> Key: HIVE-14683
> URL: https://issues.apache.org/jira/browse/HIVE-14683
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>
> killall -q -9 -f java 
> -f is an invalid flag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514199#comment-15514199
 ] 

Hive QA commented on HIVE-14713:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829889/HIVE-14713.3.patch

{color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10573 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver-llap_acid.q-explain_ddl.q-masking_3.q-and-27-more - did not 
produce a TEST-*.xml file
TestCliDriver-ql_rewrite_gbtoidx.q-json_serde1.q-auto_join23.q-and-27-more - 
did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1275/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1275/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1275/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829889 - PreCommit-HIVE-Build

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, 
> HIVE-14713.3.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14817) Shutdown the SessionManager timeoutChecker thread properly upon shutdown

2016-09-22 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14817:
--
Status: Patch Available  (was: Open)

> Shutdown the SessionManager timeoutChecker thread properly upon shutdown
> 
>
> Key: HIVE-14817
> URL: https://issues.apache.org/jira/browse/HIVE-14817
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14817.01.patch
>
>
> Shutdown for SessionManager waits 10seconds for all threads on the 
> threadpoolExecutor to shutdown correctly.
> The cleaner thread - with default settings - will take 6 hours to shutdown, 
> so essentially any shutdown of HS2 is always delayed by 10s.
> The cleaner thread should be shutdown properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14817) Shutdown the SessionManager timeoutChecker thread properly upon shutdown

2016-09-22 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14817:
--
Attachment: HIVE-14817.01.patch

[~thejas], [~prasanth_j] - please review. This cuts the test runtime of 
TestXSRFDFilter by 40 seconds, and likely other tests as well.

> Shutdown the SessionManager timeoutChecker thread properly upon shutdown
> 
>
> Key: HIVE-14817
> URL: https://issues.apache.org/jira/browse/HIVE-14817
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14817.01.patch
>
>
> Shutdown for SessionManager waits 10seconds for all threads on the 
> threadpoolExecutor to shutdown correctly.
> The cleaner thread - with default settings - will take 6 hours to shutdown, 
> so essentially any shutdown of HS2 is always delayed by 10s.
> The cleaner thread should be shutdown properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-09-22 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.8.patch

Upload patch to fix TestMiniLlapCliDriver[cross_join]. Other test failures are 
irrelevant.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.2.patch, 
> HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, 
> HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14774) Canceling query using Ctrl-C in beeline might lead to stale locks

2016-09-22 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-14774:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed to 2.2.0 and 2.1.1. Thanks [~jxiang] [~mohitsabharwal] for the review.

> Canceling query using Ctrl-C in beeline might lead to stale locks
> -
>
> Key: HIVE-14774
> URL: https://issues.apache.org/jira/browse/HIVE-14774
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14774.patch
>
>
> Terminating a running query using Ctrl-C in Beeline might lead to stale locks 
> since the process running the query might still be able to acquire the locks 
> but fail to release them after the query terminate abnormally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Attachment: HIVE-1.1.patch

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Attachment: (was: HIVE-1.1.patch)

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514109#comment-15514109
 ] 

Aihua Xu commented on HIVE-1:
-

[~xuefuz] Can you help review the patch? Thanks.

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Status: Patch Available  (was: Open)

Patch-1: made the change to add a configuration for the port range.

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Attachment: HIVE-1.1.patch

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Attachment: (was: HIVE-14373.05.patch)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Status: Patch Available  (was: In Progress)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.05.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Attachment: HIVE-14373.05.patch

I'll create a new review-board submission, as I can't edit the old one.

This patch takes what Abdullah had and adds improvements, including:
 * added an abstraction in CliDrivers to increase code reuse
 * allowed QTEST_LEAVE_FILES implemented in HIVE-8100 to be used to leave files 
in blobstore for debugging
 * implemented unique blobstore paths for individual test runs, so if multiple 
people start test runs at the same time with the same blobstore path, there 
will be no collisions
 * moved test.blobstore.path to the conf.xml file, so it need not be specified 
each time
 * fixed README, added more examples

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.05.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14774) Canceling query using Ctrl-C in beeline might lead to stale locks

2016-09-22 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513999#comment-15513999
 ] 

Mohit Sabharwal commented on HIVE-14774:


LGTM as well +1

> Canceling query using Ctrl-C in beeline might lead to stale locks
> -
>
> Key: HIVE-14774
> URL: https://issues.apache.org/jira/browse/HIVE-14774
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14774.patch
>
>
> Terminating a running query using Ctrl-C in Beeline might lead to stale locks 
> since the process running the query might still be able to acquire the locks 
> but fail to release them after the query terminate abnormally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14814) metastoreClient is used directly in Hive cause NPE

2016-09-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14814:
-
Target Version/s: 1.3.0, 2.2.0, 2.1.1  (was: 1.3.0, 2.1.0, 2.2.0)

> metastoreClient is used directly in Hive cause NPE
> --
>
> Key: HIVE-14814
> URL: https://issues.apache.org/jira/browse/HIVE-14814
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14814.1.patch
>
>
> Changes introduced by HIVE-13622 uses metastoreClient directly in Hive.java 
> which may be null causing NPE. Instead it should use getMSC() which will 
> initialize metastoreClient variable when null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14814) metastoreClient is used directly in Hive cause NPE

2016-09-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14814:
-
  Resolution: Fixed
   Fix Version/s: 2.1.0
  2.2.0
  1.3.0
Target Version/s: 2.1.0, 1.3.0, 2.2.0  (was: 1.3.0, 2.1.0, 2.2.0)
  Status: Resolved  (was: Patch Available)

Test failures are unrelated. Committed to all branches.

> metastoreClient is used directly in Hive cause NPE
> --
>
> Key: HIVE-14814
> URL: https://issues.apache.org/jira/browse/HIVE-14814
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.2.0, 2.1.0
>
> Attachments: HIVE-14814.1.patch
>
>
> Changes introduced by HIVE-13622 uses metastoreClient directly in Hive.java 
> which may be null causing NPE. Instead it should use getMSC() which will 
> initialize metastoreClient variable when null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14814) metastoreClient is used directly in Hive cause NPE

2016-09-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14814:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> metastoreClient is used directly in Hive cause NPE
> --
>
> Key: HIVE-14814
> URL: https://issues.apache.org/jira/browse/HIVE-14814
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14814.1.patch
>
>
> Changes introduced by HIVE-13622 uses metastoreClient directly in Hive.java 
> which may be null causing NPE. Instead it should use getMSC() which will 
> initialize metastoreClient variable when null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-22 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-14713:
-
Status: Patch Available  (was: In Progress)

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, 
> HIVE-14713.3.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-22 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-14713:
-
Attachment: HIVE-14713.3.patch

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, 
> HIVE-14713.3.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-22 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513975#comment-15513975
 ] 

Illya Yalovyy commented on HIVE-14713:
--

The patch was updated with minor performance improvement.

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, 
> HIVE-14713.3.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-22 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-14713:
-
Status: In Progress  (was: Patch Available)

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12222) Define port range in property for RPCServer

2016-09-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-1:
---

Assignee: Aihua Xu

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Attachment: HIVE-14373.05.patch

I'll open a new review-board request for this, as I can't update the old one.

This patch takes what Abdullah had and:
 * creates an abstraction in the CliDrivers for code reuse
 * allows the QTEST_LEAVE_FILES environment variable implemented in HIVE-8100 
to be used to optionally leave files in S3 for inspection and debugging
 * abstracts the test.blobstore.path to the conf.xml file, so it doesn't need 
to be set each time
 * implements a unique folder identifier for each test run, so if multiple 
people run tests against the same blobstore path at the same time, there will 
be no collisions

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513905#comment-15513905
 ] 

Hive QA commented on HIVE-9423:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829872/HIVE-9423.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1274/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1274/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1274/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829872 - PreCommit-HIVE-Build

> HiveServer2: Provide the user with different error messages depending on the 
> Thrift client exception code
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, 
> HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Status: In Progress  (was: Patch Available)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13348) Add Event Nullification support for Replication

2016-09-22 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513804#comment-15513804
 ] 

Sushanth Sowmyan commented on HIVE-13348:
-

Removing gsoc tag, as this was not proceeded on for gsoc.

> Add Event Nullification support for Replication
> ---
>
> Key: HIVE-13348
> URL: https://issues.apache.org/jira/browse/HIVE-13348
> Project: Hive
>  Issue Type: Sub-task
>  Components: Import/Export
>Reporter: Sushanth Sowmyan
>
> Replication, as implemented by HIVE-7973 works as follows:
> a) For every singly modification to the hive metastore, an event gets 
> triggered that logs a notification object.
> b) Replication tools such as falcon can consume these notification objects as 
> a HCatReplicationTaskIterator from 
> HCatClient.getReplicationTasks(lastEventId, maxEvents, dbName, tableName).
> c) For each event,  we generate statements and distcp requirements for falcon 
> to export, distcp and import to do the replication (along with requisite 
> changes to export and import that would allow state management).
> The big thing missing from this picture is that while it works, it is pretty 
> dumb about how it works in that it will exhaustively process every single 
> event generated, and will try to do the export-distcp-import cycle for all 
> modifications, irrespective of whether or not that will actually get used at 
> import time.
> We need to build some sort of filtering logic which can process a batch of 
> events to identify events that will result in effective no-ops, and to 
> nullify those events from the stream before passing them on. The goal is to 
> minimize the number of events that the tools like Falcon would actually have 
> to process.
> Examples of cases where event nullification would take place:
> a) CREATE-DROP cases: If an object is being created in event#34 that will 
> eventually get dropped in event#47, then there is no point in replicating 
> this along. We simply null out both these events, and also, any other event 
> that references this object between event#34 and event#47.
> b) APPEND-APPEND : Some objects are replicated wholesale, which means every 
> APPEND that occurs would cause a full export of the object in question. At 
> this point, the prior APPENDS would all be supplanted by the last APPEND. 
> Thus, we could nullify all the prior such events. 
> Additional such cases can be inferred by analysis of the Export-Import relay 
> protocol definition at 
> https://issues.apache.org/jira/secure/attachment/12725999/EXIMReplicationReplayProtocol.pdf
>  or by reasoning out various event processing orders possible.
> Replication, as implemented by HIVE-7973 is merely a first step for 
> functional support. This work is needed for replication to be efficient at 
> all, and thus, usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13348) Add Event Nullification support for Replication

2016-09-22 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-13348:

Labels:   (was: gsoc2016)

> Add Event Nullification support for Replication
> ---
>
> Key: HIVE-13348
> URL: https://issues.apache.org/jira/browse/HIVE-13348
> Project: Hive
>  Issue Type: Sub-task
>  Components: Import/Export
>Reporter: Sushanth Sowmyan
>
> Replication, as implemented by HIVE-7973 works as follows:
> a) For every singly modification to the hive metastore, an event gets 
> triggered that logs a notification object.
> b) Replication tools such as falcon can consume these notification objects as 
> a HCatReplicationTaskIterator from 
> HCatClient.getReplicationTasks(lastEventId, maxEvents, dbName, tableName).
> c) For each event,  we generate statements and distcp requirements for falcon 
> to export, distcp and import to do the replication (along with requisite 
> changes to export and import that would allow state management).
> The big thing missing from this picture is that while it works, it is pretty 
> dumb about how it works in that it will exhaustively process every single 
> event generated, and will try to do the export-distcp-import cycle for all 
> modifications, irrespective of whether or not that will actually get used at 
> import time.
> We need to build some sort of filtering logic which can process a batch of 
> events to identify events that will result in effective no-ops, and to 
> nullify those events from the stream before passing them on. The goal is to 
> minimize the number of events that the tools like Falcon would actually have 
> to process.
> Examples of cases where event nullification would take place:
> a) CREATE-DROP cases: If an object is being created in event#34 that will 
> eventually get dropped in event#47, then there is no point in replicating 
> this along. We simply null out both these events, and also, any other event 
> that references this object between event#34 and event#47.
> b) APPEND-APPEND : Some objects are replicated wholesale, which means every 
> APPEND that occurs would cause a full export of the object in question. At 
> this point, the prior APPENDS would all be supplanted by the last APPEND. 
> Thus, we could nullify all the prior such events. 
> Additional such cases can be inferred by analysis of the Export-Import relay 
> protocol definition at 
> https://issues.apache.org/jira/secure/attachment/12725999/EXIMReplicationReplayProtocol.pdf
>  or by reasoning out various event processing orders possible.
> Replication, as implemented by HIVE-7973 is merely a first step for 
> functional support. This work is needed for replication to be efficient at 
> all, and thus, usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14805) Subquery inside a view will have the object in the subquery as the direct input

2016-09-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513748#comment-15513748
 ] 

Aihua Xu commented on HIVE-14805:
-

Those tests are not related.

> Subquery inside a view will have the object in the subquery as the direct 
> input 
> 
>
> Key: HIVE-14805
> URL: https://issues.apache.org/jira/browse/HIVE-14805
> Project: Hive
>  Issue Type: Bug
>  Components: Views
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14805.1.patch, HIVE-14805.2.patch
>
>
> Here is the repro steps.
> {noformat}
> create table t1(col string);
> create view v1 as select * from t1;
> create view dataview as select * from  (select * from v1) v2;
> select * from dataview;
> {noformat}
> If hive is configured with authorization hook like Sentry, it will require 
> the access not only for dataview but also for v1, which should not be 
> required.
> The subquery seems to not carry insideview property from the parent query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14805) Subquery inside a view will have the object in the subquery as the direct input

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513745#comment-15513745
 ] 

Hive QA commented on HIVE-14805:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829855/HIVE-14805.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1273/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1273/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1273/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829855 - PreCommit-HIVE-Build

> Subquery inside a view will have the object in the subquery as the direct 
> input 
> 
>
> Key: HIVE-14805
> URL: https://issues.apache.org/jira/browse/HIVE-14805
> Project: Hive
>  Issue Type: Bug
>  Components: Views
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14805.1.patch, HIVE-14805.2.patch
>
>
> Here is the repro steps.
> {noformat}
> create table t1(col string);
> create view v1 as select * from t1;
> create view dataview as select * from  (select * from v1) v2;
> select * from dataview;
> {noformat}
> If hive is configured with authorization hook like Sentry, it will require 
> the access not only for dataview but also for v1, which should not be 
> required.
> The subquery seems to not carry insideview property from the parent query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14582) Add trunc(numeric) udf

2016-09-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513700#comment-15513700
 ] 

Ashutosh Chauhan commented on HIVE-14582:
-

I second [~niklaus.xiao] . Overloading existing trunc should be possible and is 
much more desired. Any other name will deviate us from sql standard.

> Add trunc(numeric) udf
> --
>
> Key: HIVE-14582
> URL: https://issues.apache.org/jira/browse/HIVE-14582
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-14582.patch
>
>
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14426) Extensive logging on info level in WebHCat

2016-09-22 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513658#comment-15513658
 ] 

Peter Vary commented on HIVE-14426:
---

Same errors as in HIVE-14098 + plus some new:
{code}
162d161
< 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
168d166
< org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby2
170d167
< org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown
193d189
< org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_skewjoin
195,196d190
< 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_1
< org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_tests
199d192
< org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1
203,210d195
< org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_3
< org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_4
< 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_udf
< 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_multi_or_projection
< 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_varchar
< 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_math_funcs
< 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp
< 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_ints_casts
270c255
{code}

The new one is related to this 
(https://builds.apache.org/job/PreCommit-HIVE-Build/1272/testReport/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/org_apache_hadoop_hive_cli_TestMiniTezCliDriver/)
My guess is there was an error in one of the executors, since the 
TestMiniTezCliDriver was running on another instances. (for example: 
https://builds.apache.org/job/PreCommit-HIVE-Build/1272/testReport/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/testCliDriver_acid_globallimit/)

So all-in-all I think none of the errors are related.

Thanks,
Peter

> Extensive logging on info level in WebHCat
> --
>
> Key: HIVE-14426
> URL: https://issues.apache.org/jira/browse/HIVE-14426
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14426.2.patch, HIVE-14426.3.patch, 
> HIVE-14426.4.patch, HIVE-14426.5.patch, HIVE-14426.6.patch, 
> HIVE-14426.7.patch, HIVE-14426.8.patch, HIVE-14426.9-branch-2.1.patch, 
> HIVE-14426.9.patch, HIVE-14426.patch
>
>
> There is an extensive logging in WebHCat at info level, and even some 
> sensitive information could be logged



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code

2016-09-22 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513653#comment-15513653
 ] 

Vihang Karajgaonkar commented on HIVE-9423:
---

Thanks for the patch [~pvary]. This issue has been a pain point for beeline 
users and more user-friendly error messages helps a lot. Patch looks good to me.

> HiveServer2: Provide the user with different error messages depending on the 
> Thrift client exception code
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, 
> HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513528#comment-15513528
 ] 

Xuefu Zhang commented on HIVE-14029:


+1 on identifying the minimum set.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14426) Extensive logging on info level in WebHCat

2016-09-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513599#comment-15513599
 ] 

Hive QA commented on HIVE-14426:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829802/HIVE-14426.9-branch-2.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 269 failed/errored test(s), 10355 tests 
executed
*Failed tests:*
{noformat}
249_TestHWISessionManager - did not produce a TEST-*.xml file
382_TestMsgBusConnection - did not produce a TEST-*.xml file
772_TestHiveDruidQueryBasedInputFormat - did not produce a TEST-*.xml file
773_TestDruidSerDe - did not produce a TEST-*.xml file
783_TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file
784_TestJdbcWithMiniKdc - did not produce a TEST-*.xml file
785_TestHs2HooksWithMiniKdc - did not produce a TEST-*.xml file
787_TestJdbcWithDBTokenStore - did not produce a TEST-*.xml file
788_TestJdbcWithMiniKdcCookie - did not produce a TEST-*.xml file
789_TestJdbcNonKrbSASLWithMiniKdc - did not produce a TEST-*.xml file
791_TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fouter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_map_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_json_serde1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13

[jira] [Updated] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code

2016-09-22 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-9423:
-
Summary: HiveServer2: Provide the user with different error messages 
depending on the Thrift client exception code  (was: HiveServer2: Implement 
some admission control mechanism for graceful degradation when resources are 
exhausted)

> HiveServer2: Provide the user with different error messages depending on the 
> Thrift client exception code
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, 
> HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted

2016-09-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513552#comment-15513552
 ] 

Xuefu Zhang commented on HIVE-9423:
---

Could we update the JIRA title to reflect what the patch is actually doing? 
Thanks.

> HiveServer2: Implement some admission control mechanism for graceful 
> degradation when resources are exhausted
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, 
> HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted

2016-09-22 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-9423:
-
Attachment: HIVE-9423.4.patch

Changed the messages according to [~ctang.ma] suggestion.

Thanks,
Peter

> HiveServer2: Implement some admission control mechanism for graceful 
> degradation when resources are exhausted
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, 
> HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513539#comment-15513539
 ] 

Rui Li commented on HIVE-14029:
---

I'm using:
{noformat}
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
{noformat}

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513534#comment-15513534
 ] 

Sergio Peña commented on HIVE-14029:


Which JDK you're using? Jenkins is using JDK8

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513473#comment-15513473
 ] 

Rui Li commented on HIVE-14029:
---

Seems we have two {{javax.ws.rs.core.UriInfo}} interfaces from two jars: 
javax.ws.rs-api and jersey-core. Before the patch, we only have one from 
jersey-core. Maybe there's some conflicts in the dependency upgrade. We need to 
fix it because it breaks build.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted

2016-09-22 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513451#comment-15513451
 ] 

Chaoyu Tang commented on HIVE-9423:
---

+1

> HiveServer2: Implement some admission control mechanism for graceful 
> degradation when resources are exhausted
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513430#comment-15513430
 ] 

Sergio Peña commented on HIVE-14029:


Sorry Fer, I meant 2.2 :P. I got confused with numbers.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14582) Add trunc(numeric) udf

2016-09-22 Thread Niklaus Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513395#comment-15513395
 ] 

Niklaus Xiao commented on HIVE-14582:
-

Is it possible to add {{trunc(number)}} logic to the existing {{trunc(date)}} 
implement ?

> Add trunc(numeric) udf
> --
>
> Key: HIVE-14582
> URL: https://issues.apache.org/jira/browse/HIVE-14582
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-14582.patch
>
>
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14754) Track the queries execution lifecycle times

2016-09-22 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-14754:
---
Component/s: (was: Metastore)

> Track the queries execution lifecycle times
> ---
>
> Key: HIVE-14754
> URL: https://issues.apache.org/jira/browse/HIVE-14754
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, HiveServer2
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> We should be able to track the nr. of queries being compiled/executed at any 
> given time, as well as the duration of the execution and compilation phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14754) Track the queries execution lifecycle times

2016-09-22 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara reassigned HIVE-14754:
--

Assignee: Barna Zsombor Klara

> Track the queries execution lifecycle times
> ---
>
> Key: HIVE-14754
> URL: https://issues.apache.org/jira/browse/HIVE-14754
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, HiveServer2
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> We should be able to track the nr. of queries being compiled/executed at any 
> given time, as well as the duration of the execution and compilation phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted

2016-09-22 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513372#comment-15513372
 ] 

Peter Vary commented on HIVE-9423:
--

[~ctang.ma] That is a good question! :)
What about this messages:
{code}
+hs2-unexpected-end-of-file: Unexpected end of file when reading from HS2 
server. The root \
+cause might be too many concurrent connections. Please ask the administrator 
to check the number \
+of active connections, and adjust hive.server2.thrift.max.worker.threads if 
applicable.
+hs2-could-not-open-connection: Could not open connection to the HS2 server. 
Please check the \
+server URI and if the URI is correct, then ask the administrator to check the 
server status.
{code}

Thanks,
Peter

> HiveServer2: Implement some admission control mechanism for graceful 
> degradation when resources are exhausted
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14358) Add metrics for number of queries executed for each execution engine (mr, spark, tez)

2016-09-22 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513350#comment-15513350
 ] 

Barna Zsombor Klara commented on HIVE-14358:


Failures seem unrelated, most were failing before, the one test which failed 
with this run is flaky.

> Add metrics for number of queries executed for each execution engine (mr, 
> spark, tez)
> -
>
> Key: HIVE-14358
> URL: https://issues.apache.org/jira/browse/HIVE-14358
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 2.1.0
>Reporter: Lenni Kuff
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14358.patch
>
>
> HiveServer2 currently has a metric for the total number of queries ran since 
> last restart, but it would be useful to also have metrics for number of 
> queries ran for each execution engine. This would improve supportability by 
> allowing users to get a high-level understanding of what workloads had been 
> running on the server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted

2016-09-22 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513336#comment-15513336
 ] 

Chaoyu Tang commented on HIVE-9423:
---

[~pvary] I have a small question about the message and suggestion presented to 
Beeline user when they run into login timeout issue such as
{code}
+hs2-unexpected-end-of-file: Unexpected end of file when reading from HS2 
server. The root \
+cause might be too many concurrent connections. Please check the number of 
active \
+connections, and adjust hive.server2.thrift.max.worker.threads if applicable.
{code}
Do you think that these Beeline user has the privilege to "check the number of 
active connections, and adjust hive.server2.thrift.max.worker.threads if 
applicable"

> HiveServer2: Implement some admission control mechanism for graceful 
> degradation when resources are exhausted
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14805) Subquery inside a view will have the object in the subquery as the direct input

2016-09-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513281#comment-15513281
 ] 

Aihua Xu edited comment on HIVE-14805 at 9/22/16 1:24 PM:
--

Patch-2: update 3 tests' baseline. Seems that's also the issue [~niklaus.xiao] 
mentioned in HIVE-10875.


was (Author: aihuaxu):
Patch-3: update 3 tests' baseline. Seems that's also the issue [~niklaus.xiao] 
mentioned in HIVE-10875.

> Subquery inside a view will have the object in the subquery as the direct 
> input 
> 
>
> Key: HIVE-14805
> URL: https://issues.apache.org/jira/browse/HIVE-14805
> Project: Hive
>  Issue Type: Bug
>  Components: Views
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14805.1.patch, HIVE-14805.2.patch
>
>
> Here is the repro steps.
> {noformat}
> create table t1(col string);
> create view v1 as select * from t1;
> create view dataview as select * from  (select * from v1) v2;
> select * from dataview;
> {noformat}
> If hive is configured with authorization hook like Sentry, it will require 
> the access not only for dataview but also for v1, which should not be 
> required.
> The subquery seems to not carry insideview property from the parent query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14805) Subquery inside a view will have the object in the subquery as the direct input

2016-09-22 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513294#comment-15513294
 ] 

Yongzhi Chen commented on HIVE-14805:
-

The PATCH looks good.  +1

> Subquery inside a view will have the object in the subquery as the direct 
> input 
> 
>
> Key: HIVE-14805
> URL: https://issues.apache.org/jira/browse/HIVE-14805
> Project: Hive
>  Issue Type: Bug
>  Components: Views
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14805.1.patch, HIVE-14805.2.patch
>
>
> Here is the repro steps.
> {noformat}
> create table t1(col string);
> create view v1 as select * from t1;
> create view dataview as select * from  (select * from v1) v2;
> select * from dataview;
> {noformat}
> If hive is configured with authorization hook like Sentry, it will require 
> the access not only for dataview but also for v1, which should not be 
> required.
> The subquery seems to not carry insideview property from the parent query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >