[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515494#comment-15515494 ] Lefty Leverenz commented on HIVE-14801: --- [~thejas], you committed this to master so it needs a status update. (Commit 0c392b185d98b4fb380a33a535b5f528625a47e8.) > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14580) Introduce || operator
[ https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515271#comment-15515271 ] Hive QA commented on HIVE-14580: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829933/HIVE-14580.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1283/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1283/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1283/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829933 - PreCommit-HIVE-Build > Introduce || operator > - > > Key: HIVE-14580 > URL: https://issues.apache.org/jira/browse/HIVE-14580 > Project: Hive > Issue Type: Sub-task > Components: SQL >Reporter: Ashutosh Chauhan >Assignee: Zoltan Haindrich > Attachments: HIVE-14580.1.patch > > > Functionally equivalent to concat() udf. But standard allows usage of || for > string concatenations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script
[ https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515196#comment-15515196 ] Ferdinand Xu commented on HIVE-5867: Hi [~JonnyR], can you attach file as an attachment in this ticket to trigger the Jenkins job? > JDBC driver and beeline should support executing an initial SQL script > -- > > Key: HIVE-5867 > URL: https://issues.apache.org/jira/browse/HIVE-5867 > Project: Hive > Issue Type: Improvement > Components: Clients, JDBC >Reporter: Prasad Mujumdar >Assignee: Jianguo Tian > Attachments: HIVE-5867.1.patch > > > HiveCLI support the .hiverc script that is executed at the start of the > session. This is helpful for things like registering UDFs, session specific > configs etc. > This functionality is missing for beeline and JDBC clients. It would be > useful for JDBC driver to support an init script with SQL statements that's > automatically executed after connection. The script path can be specified via > JDBC connection URL. For example > {noformat} > jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql > {noformat} > This can be added to Beeline's command line option like "-i > /home/user1/scripts/init.sql" > To help transition from HiveCLI to Beeline, we can keep the default init > script as $HOME/.hiverc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14825) Figure out the minimum set of required jars for Hive on Spark after bumping up to Spark 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515155#comment-15515155 ] Rui Li commented on HIVE-14825: --- Thanks [~Ferd] for tracking this. I expect the minimum set to be fairly small :) > Figure out the minimum set of required jars for Hive on Spark after bumping > up to Spark 2.0.0 > - > > Key: HIVE-14825 > URL: https://issues.apache.org/jira/browse/HIVE-14825 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu > > Considering that there's no assembly jar for Spark since 2.0.0, we should > figure out the minimum set of required jars for HoS to work after bumping up > to Spark 2.0.0. By this way, users can decide whether they want to add just > the required jars, or all the jars under spark's dir for convenience. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14820) RPC server for spark inside HS2 is not getting server address properly
[ https://issues.apache.org/jira/browse/HIVE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515139#comment-15515139 ] Hive QA commented on HIVE-14820: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829923/HIVE-14820.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.hcatalog.mapreduce.TestHCatMultiOutputFormat.testOutputFormat org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1282/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1282/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1282/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829923 - PreCommit-HIVE-Build > RPC server for spark inside HS2 is not getting server address properly > -- > > Key: HIVE-14820 > URL: https://issues.apache.org/jira/browse/HIVE-14820 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14820.1.patch > > > When hive.spark.client.rpc.server.address is configured, this property is not > retrieved properly because we are getting the value by {{String hiveHost = > config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS);}} which always > returns null in getServerAddress() call of RpcConfiguration.java. Rather it > should be {{String hiveHost = > config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS.varname);}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515131#comment-15515131 ] Ferdinand Xu commented on HIVE-14029: - Hi [~lirui], [~xuefuz], HIVE-14825 was created addressing this. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, > HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. > To update Spark version to 2.0.0, the following changes are required: > * Spark API updates: > ** SparkShuffler#call return Iterator instead of Iterable > ** SparkListener -> JavaSparkListener > ** InputMetrics constructor doesn’t accept readMethod > ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics > return long type instead of integer > * Dependency upgrade: > ** Jackson: 2.4.2 -> 2.6.5 > ** Netty version: 4.0.23.Final -> 4.0.29.Final > ** Scala binary version: 2.10 -> 2.11 > ** Scala version: 2.10.4 -> 2.11.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-14029: Attachment: HIVE-14029.5.patch Hi [~spena], it's weird why Jenkins can build it successfully. Hi [~lirui] I exclude the {code}javax.ws.rs{code} imported by spark-core in 5th patch. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, > HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. > To update Spark version to 2.0.0, the following changes are required: > * Spark API updates: > ** SparkShuffler#call return Iterator instead of Iterable > ** SparkListener -> JavaSparkListener > ** InputMetrics constructor doesn’t accept readMethod > ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics > return long type instead of integer > * Dependency upgrade: > ** Jackson: 2.4.2 -> 2.6.5 > ** Netty version: 4.0.23.Final -> 4.0.29.Final > ** Scala binary version: 2.10 -> 2.11 > ** Scala version: 2.10.4 -> 2.11.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14818) Reduce number of retries while starting HiveServer for tests
[ https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14818: -- Attachment: HIVE-14818.02.patch Updated patch to fix the enum reference order. Agree with 30m being too much - don't know why any restarts are attempted, but I don't plan to change that here. > Reduce number of retries while starting HiveServer for tests > > > Key: HIVE-14818 > URL: https://issues.apache.org/jira/browse/HIVE-14818 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14818.01.patch, HIVE-14818.02.patch > > > Current is 30 retries, with a 1minute sleep between each one. > The settings are likely bad for a production cluster as well. For tests, this > should be a lot lower. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14818) Reduce number of retries while starting HiveServer for tests
[ https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515020#comment-15515020 ] Hive QA commented on HIVE-14818: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829919/HIVE-14818.01.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1281/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1281/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1281/ Messages: {noformat} This message was trimmed, see log for full details main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/storage-api/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/storage-api/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/storage-api/target/tmp/conf [copy] Copying 15 files to /data/hive-ptest/working/apache-github-source-source/storage-api/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-storage-api --- [INFO] Compiling 7 source files to /data/hive-ptest/working/apache-github-source-source/storage-api/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ hive-storage-api --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ hive-storage-api --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/storage-api/target/hive-storage-api-2.2.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-storage-api --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-storage-api --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/storage-api/target/hive-storage-api-2.2.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-storage-api/2.2.0-SNAPSHOT/hive-storage-api-2.2.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/storage-api/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-storage-api/2.2.0-SNAPSHOT/hive-storage-api-2.2.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive ORC 2.2.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-orc --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/orc/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/orc (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-orc --- [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-orc --- [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/orc/src/gen/protobuf-java added. [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-orc --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-orc --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/orc/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-orc --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-orc --- [INFO] Compiling 71 source files to /data/hive-ptest/working/apache-github-source-source/orc/target/classes [WARNING] /data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/tools/FileDump.java: Some input files use or override a deprecated API. [WARNING] /data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/tools/FileDump.java: Recompile with -Xlint:deprecation for details. [WARNING] /data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/impl/RecordReaderImpl.java: /data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/impl/RecordReaderImpl.java uses unchecked or unsafe operations. [WARNING] /data/hive-ptest/working/apache-github-source-source/orc/src/java/org/apache/orc/impl/RecordReaderImpl.java: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-orc --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 7 resources [INFO] Copying 3 resources [INFO]
[jira] [Commented] (HIVE-14819) FunctionInfo for permanent functions shows TEMPORARY FunctionType
[ https://issues.apache.org/jira/browse/HIVE-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515017#comment-15515017 ] Hive QA commented on HIVE-14819: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829917/HIVE-14819.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10558 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1280/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1280/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1280/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829917 - PreCommit-HIVE-Build > FunctionInfo for permanent functions shows TEMPORARY FunctionType > - > > Key: HIVE-14819 > URL: https://issues.apache.org/jira/browse/HIVE-14819 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.1.0 >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-14819.1.patch > > > The FunctionInfo has a FunctionType field which describes if the function is > a builtin/persistent/temporary function. But for permanent functions, the > FunctionInfo being returned by the FunctionRegistry is showing the type to be > TEMPORARY. > This affects things which may be depending on function type, for example > LlapDecider, which will allow builtin/persistent UDFs to be used in LLAP but > not temporary functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14824) Separate fstype from cluster type in QTestUtil
[ https://issues.apache.org/jira/browse/HIVE-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14824: -- Attachment: HIVE-14824.01.patch [~prasanth_j] - please review. After this, to run TestEncHdfsDriver on tez, the following change is adequate in CliConfigs. Similarly for llap, spark etc. {code} -setHiveConfDir("data/conf"); -setClusterType(MiniClusterType.mr); +setHiveConfDir("data/conf/tez"); +setClusterType(MiniClusterType.tez); {code} > Separate fstype from cluster type in QTestUtil > -- > > Key: HIVE-14824 > URL: https://issues.apache.org/jira/browse/HIVE-14824 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14824.01.patch > > > QTestUtil cluster type encodes the file system. e.g. > MiniClusterType.encrypted means mr + encrypted hdfs, spark means file://, mr > means hdfs etc. > These can be separated out. e.g. To add tests for tez against encrypted, and > llap against encrypted - I'd need to introduce 2 new cluster types. > Instead it's better to separate the storage into it's own types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14824) Separate fstype from cluster type in QTestUtil
[ https://issues.apache.org/jira/browse/HIVE-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14824: -- Status: Patch Available (was: Open) > Separate fstype from cluster type in QTestUtil > -- > > Key: HIVE-14824 > URL: https://issues.apache.org/jira/browse/HIVE-14824 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14824.01.patch > > > QTestUtil cluster type encodes the file system. e.g. > MiniClusterType.encrypted means mr + encrypted hdfs, spark means file://, mr > means hdfs etc. > These can be separated out. e.g. To add tests for tez against encrypted, and > llap against encrypted - I'd need to introduce 2 new cluster types. > Instead it's better to separate the storage into it's own types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14823) ZooKeeperHiveLockManager logs WAY too much on INFO level
[ https://issues.apache.org/jira/browse/HIVE-14823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-14823. - Resolution: Duplicate Nm, dup of HIVE-12966 > ZooKeeperHiveLockManager logs WAY too much on INFO level > > > Key: HIVE-14823 > URL: https://issues.apache.org/jira/browse/HIVE-14823 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > about to release lock ... can be logged 1 times for large tables. Should > be DEBUG or even TRACE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14823) ZooKeeperHiveLockManager logs WAY too much on INFO level
[ https://issues.apache.org/jira/browse/HIVE-14823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-14823: --- Assignee: Sergey Shelukhin > ZooKeeperHiveLockManager logs WAY too much on INFO level > > > Key: HIVE-14823 > URL: https://issues.apache.org/jira/browse/HIVE-14823 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > about to release lock ... can be logged 1 times for large tables. Should > be DEBUG or even TRACE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width
[ https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-13098: --- Assignee: Sergey Shelukhin > Add a strict check for when the decimal gets converted to null due to > insufficient width > > > Key: HIVE-13098 > URL: https://issues.apache.org/jira/browse/HIVE-13098 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > When e.g. 99 is selected as decimal(5,0), the result is null. This can be > problematic, esp. if the data is written to a table and lost without the user > realizing it. There should be an option to error out in such cases instead; > it should probably be on by default and the error message should instruct the > user on how to disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14426) Extensive logging on info level in WebHCat
[ https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514982#comment-15514982 ] Eugene Koifman commented on HIVE-14426: --- FYI, WebHCat doesn't really have any JUnit tests, so these changes (at least the WebHCat part) are not being tested by the build bot. Most WebHCat tests are under hcatalog/src/test/e2e/templeton/ and require a running Hadoop instance. > Extensive logging on info level in WebHCat > -- > > Key: HIVE-14426 > URL: https://issues.apache.org/jira/browse/HIVE-14426 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14426.2.patch, HIVE-14426.3.patch, > HIVE-14426.4.patch, HIVE-14426.5.patch, HIVE-14426.6.patch, > HIVE-14426.7.patch, HIVE-14426.8.patch, HIVE-14426.9-branch-2.1.patch, > HIVE-14426.9.patch, HIVE-14426.patch > > > There is an extensive logging in WebHCat at info level, and even some > sensitive information could be logged -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14821) build q test
[ https://issues.apache.org/jira/browse/HIVE-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14821: -- Attachment: HIVE-14821.2.patch > build q test > > > Key: HIVE-14821 > URL: https://issues.apache.org/jira/browse/HIVE-14821 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14821.1.patch, HIVE-14821.2.patch, HIVE-14821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514904#comment-15514904 ] Zhiyuan Yang commented on HIVE-14731: - Test failures are irrelevant. > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-14731.1.patch, HIVE-14731.2.patch, > HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, > HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514895#comment-15514895 ] Hive QA commented on HIVE-14731: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829909/HIVE-14731.8.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10522 tests executed *Failed tests:* {noformat} TestMiniLlapCliDriver-tez_schema_evolution.q-tez_join.q-file_with_header_footer.q-and-27-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testDelegationTokenSharedStore org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching org.apache.hive.spark.client.TestSparkClient.testJobSubmission {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1279/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1279/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1279/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829909 - PreCommit-HIVE-Build > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-14731.1.patch, HIVE-14731.2.patch, > HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, > HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514737#comment-15514737 ] Szehon Ho commented on HIVE-14713: -- I think there is a 24 hour wait after the last +1 to get merged (at least last time I checked). Feel free to ping again if it is forgotten. > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, > HIVE-14713.3.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14821) build q test
[ https://issues.apache.org/jira/browse/HIVE-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14821: -- Attachment: HIVE-14821.1.patch > build q test > > > Key: HIVE-14821 > URL: https://issues.apache.org/jira/browse/HIVE-14821 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14821.1.patch, HIVE-14821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14817) Shutdown the SessionManager timeoutChecker thread properly upon shutdown
[ https://issues.apache.org/jira/browse/HIVE-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514671#comment-15514671 ] Hive QA commented on HIVE-14817: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829914/HIVE-14817.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10555 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1278/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1278/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1278/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829914 - PreCommit-HIVE-Build > Shutdown the SessionManager timeoutChecker thread properly upon shutdown > > > Key: HIVE-14817 > URL: https://issues.apache.org/jira/browse/HIVE-14817 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14817.01.patch > > > Shutdown for SessionManager waits 10seconds for all threads on the > threadpoolExecutor to shutdown correctly. > The cleaner thread - with default settings - will take 6 hours to shutdown, > so essentially any shutdown of HS2 is always delayed by 10s. > The cleaner thread should be shutdown properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14821) build q test
[ https://issues.apache.org/jira/browse/HIVE-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14821: -- Status: Patch Available (was: Open) > build q test > > > Key: HIVE-14821 > URL: https://issues.apache.org/jira/browse/HIVE-14821 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514582#comment-15514582 ] Illya Yalovyy commented on HIVE-14713: -- [~szehon], [~ctang.ma], The CR got a "ship it", please advise what is the next step to get this patch accepted? > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, > HIVE-14713.3.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14821) build q test
[ https://issues.apache.org/jira/browse/HIVE-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14821: -- Attachment: HIVE-14821.patch > build q test > > > Key: HIVE-14821 > URL: https://issues.apache.org/jira/browse/HIVE-14821 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514563#comment-15514563 ] Aihua Xu commented on HIVE-1: - Thanks Xuefu for reviewing. 1. Currently I didn't try to handle space in the string. If it's not configured properly, it will fall back to 0 (which is random port). Do you think we should? Seems we are strict when we handle the entry in the hive config. 2. You are right. I thought we should do that but I forgot to add that logic when implementing it. I will add that. > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI, Spark >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > Attachments: HIVE-1.1.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node > needs to be fortified and enhanced. Therefore, all the FW comes in and > auditing. > - Regulation/compliance for auditing is another requirement to monitor all > traffic, specifying ports and locking down the ports makes it easier since we > can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14580) Introduce || operator
[ https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-14580: Attachment: HIVE-14580.1.patch I didn't wanted to "really" introduce a full new operator - and possibly open bug possibilities because of concat/|| implementation differences; so I looked into creating an alias for the concat() udf; which already has optimization & vectorization support. my options were: * purely antlr based - I failed with this approach * minor antlr change + ast rewrite - choosen path for the rewrite i've seen a few places where I can add this...but only {{SemanticAnalyzer.processPositionAlias}} looked promising - there are other places but I think {{TypeCheckProcFactory}} would be a bit late..and adding this anywhere to optimization related rewrites would be inappropriate because this is not an optimization... I've done a minor refactor and splitted {{processPositionAlias}} from it's walk logic - which I'm using to dispatch the concatenate rewrites too. [~pxiong] what do you think about it? > Introduce || operator > - > > Key: HIVE-14580 > URL: https://issues.apache.org/jira/browse/HIVE-14580 > Project: Hive > Issue Type: Sub-task > Components: SQL >Reporter: Ashutosh Chauhan >Assignee: Zoltan Haindrich > Attachments: HIVE-14580.1.patch > > > Functionally equivalent to concat() udf. But standard allows usage of || for > string concatenations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14580) Introduce || operator
[ https://issues.apache.org/jira/browse/HIVE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-14580: Status: Patch Available (was: Open) > Introduce || operator > - > > Key: HIVE-14580 > URL: https://issues.apache.org/jira/browse/HIVE-14580 > Project: Hive > Issue Type: Sub-task > Components: SQL >Reporter: Ashutosh Chauhan >Assignee: Zoltan Haindrich > Attachments: HIVE-14580.1.patch > > > Functionally equivalent to concat() udf. But standard allows usage of || for > string concatenations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514540#comment-15514540 ] Sergio Peña commented on HIVE-14373: Thanks [~poeppt]. I will take a look on the patch tomorrow or early next week. > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Thomas Poepping > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514528#comment-15514528 ] Hive QA commented on HIVE-1: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829905/HIVE-1.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.rpc.TestRpc.testServerPort {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1277/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1277/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1277/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829905 - PreCommit-HIVE-Build > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI, Spark >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > Attachments: HIVE-1.1.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security
[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514485#comment-15514485 ] Xuefu Zhang commented on HIVE-1: HI [~aihuaxu], thanks for working on this. The patch looks good. I have two minor questions: 1. Do we have a strict syntax requirement on the format of the new property value? For instance, what happens if there is space around ',' or '-'. 2. What happens if the randomly selected port is not available? Should we retry until we get a good one? > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI, Spark >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > Attachments: HIVE-1.1.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node > needs to be fortified and enhanced. Therefore, all the FW comes in and > auditing. > - Regulation/compliance for auditing is another requirement to monitor all > traffic, specifying ports and locking down the ports makes it easier since we > can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13903) getFunctionInfo is downloading jar on every call
[ https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514470#comment-15514470 ] Jason Dere commented on HIVE-13903: --- Hi [~prongs], just trying to get a little background on this one - was the JAR being downloaded once per session, or was it getting downloaded every time the UDF was being used, even in the same session? > getFunctionInfo is downloading jar on every call > > > Key: HIVE-13903 > URL: https://issues.apache.org/jira/browse/HIVE-13903 > Project: Hive > Issue Type: Bug >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Fix For: 2.1.0 > > Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, > HIVE-13903.02.patch > > > on queries using permanent udfs, the jar file of the udf is downloaded > multiple times. Each call originating from Registry.getFunctionInfo. This > increases time for the query, especially if that query is just an explain > query. The jar should be downloaded once, and not downloaded again if the udf > class is accessible in the current thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14820) RPC server for spark inside HS2 is not getting server address properly
[ https://issues.apache.org/jira/browse/HIVE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514429#comment-15514429 ] Yongzhi Chen commented on HIVE-14820: - Simple change, LGTM +1 > RPC server for spark inside HS2 is not getting server address properly > -- > > Key: HIVE-14820 > URL: https://issues.apache.org/jira/browse/HIVE-14820 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14820.1.patch > > > When hive.spark.client.rpc.server.address is configured, this property is not > retrieved properly because we are getting the value by {{String hiveHost = > config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS);}} which always > returns null in getServerAddress() call of RpcConfiguration.java. Rather it > should be {{String hiveHost = > config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS.varname);}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514387#comment-15514387 ] Thomas Poepping commented on HIVE-14373: It wouldn't make sense for these tests to be related, as I am touching almost no existing code. Can I get eyes on this? > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Thomas Poepping > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14751) Add support for date truncation
[ https://issues.apache.org/jira/browse/HIVE-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514382#comment-15514382 ] Ashutosh Chauhan commented on HIVE-14751: - LGTM +1 Question: This currently only supports timestamp argument. Shall it also support date and interval? We can add that support later, just want to make sure its not something you missed. > Add support for date truncation > --- > > Key: HIVE-14751 > URL: https://issues.apache.org/jira/browse/HIVE-14751 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14751.patch > > > Add support for {{floor ( to )}}, which is equivalent to > {{date_trunc(, )}}. > https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14579) Add support for date extract
[ https://issues.apache.org/jira/browse/HIVE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514352#comment-15514352 ] Ashutosh Chauhan commented on HIVE-14579: - clever trick of rewriting in parser +1 > Add support for date extract > > > Key: HIVE-14579 > URL: https://issues.apache.org/jira/browse/HIVE-14579 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Ashutosh Chauhan >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14579.01.patch, HIVE-14579.patch, HIVE-14579.patch > > > https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514355#comment-15514355 ] Hive QA commented on HIVE-14373: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829898/HIVE-14373.05.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10555 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1276/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1276/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1276/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829898 - PreCommit-HIVE-Build > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Thomas Poepping > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514347#comment-15514347 ] Chaoyu Tang commented on HIVE-14713: LGTM, +1 > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, > HIVE-14713.3.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14820) RPC server for spark inside HS2 is not getting server address properly
[ https://issues.apache.org/jira/browse/HIVE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14820: Status: Patch Available (was: Open) patch-1: Change to use .varname to be the key to the map. Otherwise, get(Object) will return null always. > RPC server for spark inside HS2 is not getting server address properly > -- > > Key: HIVE-14820 > URL: https://issues.apache.org/jira/browse/HIVE-14820 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14820.1.patch > > > When hive.spark.client.rpc.server.address is configured, this property is not > retrieved properly because we are getting the value by {{String hiveHost = > config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS);}} which always > returns null in getServerAddress() call of RpcConfiguration.java. Rather it > should be {{String hiveHost = > config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS.varname);}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14820) RPC server for spark inside HS2 is not getting server address properly
[ https://issues.apache.org/jira/browse/HIVE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14820: Attachment: HIVE-14820.1.patch > RPC server for spark inside HS2 is not getting server address properly > -- > > Key: HIVE-14820 > URL: https://issues.apache.org/jira/browse/HIVE-14820 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14820.1.patch > > > When hive.spark.client.rpc.server.address is configured, this property is not > retrieved properly because we are getting the value by {{String hiveHost = > config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS);}} which always > returns null in getServerAddress() call of RpcConfiguration.java. Rather it > should be {{String hiveHost = > config.get(HiveConf.ConfVars.SPARK_RPC_SERVER_ADDRESS.varname);}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-1: Component/s: Spark > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI, Spark >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > Attachments: HIVE-1.1.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node > needs to be fortified and enhanced. Therefore, all the FW comes in and > auditing. > - Regulation/compliance for auditing is another requirement to monitor all > traffic, specifying ports and locking down the ports makes it easier since we > can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514263#comment-15514263 ] Lefty Leverenz commented on HIVE-14793: --- Okay, thanks. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.2.0 > > Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch, > HIVE-14793.03.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14818) Reduce number of retries while starting HiveServer for tests
[ https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514259#comment-15514259 ] Prasanth Jayachandran commented on HIVE-14818: -- Not relevant to this issue. But default 30 min sleep time feels a lot to me. Other than that patch lgtm +1 > Reduce number of retries while starting HiveServer for tests > > > Key: HIVE-14818 > URL: https://issues.apache.org/jira/browse/HIVE-14818 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14818.01.patch > > > Current is 30 retries, with a 1minute sleep between each one. > The settings are likely bad for a production cluster as well. For tests, this > should be a lot lower. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14818) Reduce number of retries while starting HiveServer for tests
[ https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14818: -- Attachment: HIVE-14818.01.patch [~thejas], [~prasanth_j] - please review. > Reduce number of retries while starting HiveServer for tests > > > Key: HIVE-14818 > URL: https://issues.apache.org/jira/browse/HIVE-14818 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14818.01.patch > > > Current is 30 retries, with a 1minute sleep between each one. > The settings are likely bad for a production cluster as well. For tests, this > should be a lot lower. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14818) Reduce number of retries while starting HiveServer for tests
[ https://issues.apache.org/jira/browse/HIVE-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14818: -- Status: Patch Available (was: Open) > Reduce number of retries while starting HiveServer for tests > > > Key: HIVE-14818 > URL: https://issues.apache.org/jira/browse/HIVE-14818 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14818.01.patch > > > Current is 30 retries, with a 1minute sleep between each one. > The settings are likely bad for a production cluster as well. For tests, this > should be a lot lower. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14819) FunctionInfo for permanent functions shows TEMPORARY FunctionType
[ https://issues.apache.org/jira/browse/HIVE-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-14819: -- Status: Patch Available (was: Open) > FunctionInfo for permanent functions shows TEMPORARY FunctionType > - > > Key: HIVE-14819 > URL: https://issues.apache.org/jira/browse/HIVE-14819 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.1.0 >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-14819.1.patch > > > The FunctionInfo has a FunctionType field which describes if the function is > a builtin/persistent/temporary function. But for permanent functions, the > FunctionInfo being returned by the FunctionRegistry is showing the type to be > TEMPORARY. > This affects things which may be depending on function type, for example > LlapDecider, which will allow builtin/persistent UDFs to be used in LLAP but > not temporary functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14819) FunctionInfo for permanent functions shows TEMPORARY FunctionType
[ https://issues.apache.org/jira/browse/HIVE-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514230#comment-15514230 ] Jason Dere commented on HIVE-14819: --- Patch to allow the registry to set PERSISTENT type when registering permanent functions to the session registry. Previously all functions added to session registry had the TEMPORARY tag. > FunctionInfo for permanent functions shows TEMPORARY FunctionType > - > > Key: HIVE-14819 > URL: https://issues.apache.org/jira/browse/HIVE-14819 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.1.0 >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-14819.1.patch > > > The FunctionInfo has a FunctionType field which describes if the function is > a builtin/persistent/temporary function. But for permanent functions, the > FunctionInfo being returned by the FunctionRegistry is showing the type to be > TEMPORARY. > This affects things which may be depending on function type, for example > LlapDecider, which will allow builtin/persistent UDFs to be used in LLAP but > not temporary functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14819) FunctionInfo for permanent functions shows TEMPORARY FunctionType
[ https://issues.apache.org/jira/browse/HIVE-14819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-14819: -- Attachment: HIVE-14819.1.patch > FunctionInfo for permanent functions shows TEMPORARY FunctionType > - > > Key: HIVE-14819 > URL: https://issues.apache.org/jira/browse/HIVE-14819 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.1.0 >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-14819.1.patch > > > The FunctionInfo has a FunctionType field which describes if the function is > a builtin/persistent/temporary function. But for permanent functions, the > FunctionInfo being returned by the FunctionRegistry is showing the type to be > TEMPORARY. > This affects things which may be depending on function type, for example > LlapDecider, which will allow builtin/persistent UDFs to be used in LLAP but > not temporary functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514219#comment-15514219 ] Lefty Leverenz commented on HIVE-9423: -- +1 for the new error messages (patch 4) > HiveServer2: Provide the user with different error messages depending on the > Thrift client exception code > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, > HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14683) ptest uses invalid killall command
[ https://issues.apache.org/jira/browse/HIVE-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved HIVE-14683. --- Resolution: Duplicate Assignee: (was: Siddharth Seth) > ptest uses invalid killall command > -- > > Key: HIVE-14683 > URL: https://issues.apache.org/jira/browse/HIVE-14683 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth > > killall -q -9 -f java > -f is an invalid flag -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514199#comment-15514199 ] Hive QA commented on HIVE-14713: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829889/HIVE-14713.3.patch {color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10573 tests executed *Failed tests:* {noformat} TestCliDriver-llap_acid.q-explain_ddl.q-masking_3.q-and-27-more - did not produce a TEST-*.xml file TestCliDriver-ql_rewrite_gbtoidx.q-json_serde1.q-auto_join23.q-and-27-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1275/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1275/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1275/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829889 - PreCommit-HIVE-Build > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, > HIVE-14713.3.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14817) Shutdown the SessionManager timeoutChecker thread properly upon shutdown
[ https://issues.apache.org/jira/browse/HIVE-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14817: -- Status: Patch Available (was: Open) > Shutdown the SessionManager timeoutChecker thread properly upon shutdown > > > Key: HIVE-14817 > URL: https://issues.apache.org/jira/browse/HIVE-14817 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14817.01.patch > > > Shutdown for SessionManager waits 10seconds for all threads on the > threadpoolExecutor to shutdown correctly. > The cleaner thread - with default settings - will take 6 hours to shutdown, > so essentially any shutdown of HS2 is always delayed by 10s. > The cleaner thread should be shutdown properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14817) Shutdown the SessionManager timeoutChecker thread properly upon shutdown
[ https://issues.apache.org/jira/browse/HIVE-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14817: -- Attachment: HIVE-14817.01.patch [~thejas], [~prasanth_j] - please review. This cuts the test runtime of TestXSRFDFilter by 40 seconds, and likely other tests as well. > Shutdown the SessionManager timeoutChecker thread properly upon shutdown > > > Key: HIVE-14817 > URL: https://issues.apache.org/jira/browse/HIVE-14817 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14817.01.patch > > > Shutdown for SessionManager waits 10seconds for all threads on the > threadpoolExecutor to shutdown correctly. > The cleaner thread - with default settings - will take 6 hours to shutdown, > so essentially any shutdown of HS2 is always delayed by 10s. > The cleaner thread should be shutdown properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated HIVE-14731: Attachment: HIVE-14731.8.patch Upload patch to fix TestMiniLlapCliDriver[cross_join]. Other test failures are irrelevant. > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-14731.1.patch, HIVE-14731.2.patch, > HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, > HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14774) Canceling query using Ctrl-C in beeline might lead to stale locks
[ https://issues.apache.org/jira/browse/HIVE-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-14774: --- Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Committed to 2.2.0 and 2.1.1. Thanks [~jxiang] [~mohitsabharwal] for the review. > Canceling query using Ctrl-C in beeline might lead to stale locks > - > > Key: HIVE-14774 > URL: https://issues.apache.org/jira/browse/HIVE-14774 > Project: Hive > Issue Type: Bug > Components: Locking >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14774.patch > > > Terminating a running query using Ctrl-C in Beeline might lead to stale locks > since the process running the query might still be able to acquire the locks > but fail to release them after the query terminate abnormally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-1: Attachment: HIVE-1.1.patch > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > Attachments: HIVE-1.1.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node > needs to be fortified and enhanced. Therefore, all the FW comes in and > auditing. > - Regulation/compliance for auditing is another requirement to monitor all > traffic, specifying ports and locking down the ports makes it easier since we > can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-1: Attachment: (was: HIVE-1.1.patch) > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > Attachments: HIVE-1.1.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node > needs to be fortified and enhanced. Therefore, all the FW comes in and > auditing. > - Regulation/compliance for auditing is another requirement to monitor all > traffic, specifying ports and locking down the ports makes it easier since we > can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514109#comment-15514109 ] Aihua Xu commented on HIVE-1: - [~xuefuz] Can you help review the patch? Thanks. > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > Attachments: HIVE-1.1.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node > needs to be fortified and enhanced. Therefore, all the FW comes in and > auditing. > - Regulation/compliance for auditing is another requirement to monitor all > traffic, specifying ports and locking down the ports makes it easier since we > can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-1: Status: Patch Available (was: Open) Patch-1: made the change to add a configuration for the port range. > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > Attachments: HIVE-1.1.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node > needs to be fortified and enhanced. Therefore, all the FW comes in and > auditing. > - Regulation/compliance for auditing is another requirement to monitor all > traffic, specifying ports and locking down the ports makes it easier since we > can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-1: Attachment: HIVE-1.1.patch > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > Attachments: HIVE-1.1.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node > needs to be fortified and enhanced. Therefore, all the FW comes in and > auditing. > - Regulation/compliance for auditing is another requirement to monitor all > traffic, specifying ports and locking down the ports makes it easier since we > can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Poepping updated HIVE-14373: --- Attachment: (was: HIVE-14373.05.patch) > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Thomas Poepping > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Poepping updated HIVE-14373: --- Status: Patch Available (was: In Progress) > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Thomas Poepping > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.05.patch, > HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Poepping updated HIVE-14373: --- Attachment: HIVE-14373.05.patch I'll create a new review-board submission, as I can't edit the old one. This patch takes what Abdullah had and adds improvements, including: * added an abstraction in CliDrivers to increase code reuse * allowed QTEST_LEAVE_FILES implemented in HIVE-8100 to be used to leave files in blobstore for debugging * implemented unique blobstore paths for individual test runs, so if multiple people start test runs at the same time with the same blobstore path, there will be no collisions * moved test.blobstore.path to the conf.xml file, so it need not be specified each time * fixed README, added more examples > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Thomas Poepping > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.05.patch, > HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14774) Canceling query using Ctrl-C in beeline might lead to stale locks
[ https://issues.apache.org/jira/browse/HIVE-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513999#comment-15513999 ] Mohit Sabharwal commented on HIVE-14774: LGTM as well +1 > Canceling query using Ctrl-C in beeline might lead to stale locks > - > > Key: HIVE-14774 > URL: https://issues.apache.org/jira/browse/HIVE-14774 > Project: Hive > Issue Type: Bug > Components: Locking >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-14774.patch > > > Terminating a running query using Ctrl-C in Beeline might lead to stale locks > since the process running the query might still be able to acquire the locks > but fail to release them after the query terminate abnormally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14814) metastoreClient is used directly in Hive cause NPE
[ https://issues.apache.org/jira/browse/HIVE-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14814: - Target Version/s: 1.3.0, 2.2.0, 2.1.1 (was: 1.3.0, 2.1.0, 2.2.0) > metastoreClient is used directly in Hive cause NPE > -- > > Key: HIVE-14814 > URL: https://issues.apache.org/jira/browse/HIVE-14814 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Dileep Kumar Chiguruvada >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14814.1.patch > > > Changes introduced by HIVE-13622 uses metastoreClient directly in Hive.java > which may be null causing NPE. Instead it should use getMSC() which will > initialize metastoreClient variable when null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14814) metastoreClient is used directly in Hive cause NPE
[ https://issues.apache.org/jira/browse/HIVE-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14814: - Resolution: Fixed Fix Version/s: 2.1.0 2.2.0 1.3.0 Target Version/s: 2.1.0, 1.3.0, 2.2.0 (was: 1.3.0, 2.1.0, 2.2.0) Status: Resolved (was: Patch Available) Test failures are unrelated. Committed to all branches. > metastoreClient is used directly in Hive cause NPE > -- > > Key: HIVE-14814 > URL: https://issues.apache.org/jira/browse/HIVE-14814 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Dileep Kumar Chiguruvada >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.2.0, 2.1.0 > > Attachments: HIVE-14814.1.patch > > > Changes introduced by HIVE-13622 uses metastoreClient directly in Hive.java > which may be null causing NPE. Instead it should use getMSC() which will > initialize metastoreClient variable when null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14814) metastoreClient is used directly in Hive cause NPE
[ https://issues.apache.org/jira/browse/HIVE-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14814: - Fix Version/s: (was: 2.1.0) 2.1.1 > metastoreClient is used directly in Hive cause NPE > -- > > Key: HIVE-14814 > URL: https://issues.apache.org/jira/browse/HIVE-14814 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Dileep Kumar Chiguruvada >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14814.1.patch > > > Changes introduced by HIVE-13622 uses metastoreClient directly in Hive.java > which may be null causing NPE. Instead it should use getMSC() which will > initialize metastoreClient variable when null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illya Yalovyy updated HIVE-14713: - Status: Patch Available (was: In Progress) > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, > HIVE-14713.3.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illya Yalovyy updated HIVE-14713: - Attachment: HIVE-14713.3.patch > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, > HIVE-14713.3.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513975#comment-15513975 ] Illya Yalovyy commented on HIVE-14713: -- The patch was updated with minor performance improvement. > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch, > HIVE-14713.3.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illya Yalovyy updated HIVE-14713: - Status: In Progress (was: Patch Available) > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12222) Define port range in property for RPCServer
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-1: --- Assignee: Aihua Xu > Define port range in property for RPCServer > --- > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 >Reporter: Andrew Lee >Assignee: Aihua Xu > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would > need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every > time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other > word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, > and in order to use that, they need to login to the edge node (via SSH). Now, > here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to > time. Most users will abuse the resource on that edge node (increasing > HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, > etc), this may cause the HS2 process to run into OOME, choke and die, etc. > various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly > available. This makes sense to run it on the gateway node or a service node > and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier > to run HS2 with a daemon user account, etc. so we don't want users to run > HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file > handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node > needs to be fortified and enhanced. Therefore, all the FW comes in and > auditing. > - Regulation/compliance for auditing is another requirement to monitor all > traffic, specifying ports and locking down the ports makes it easier since we > can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Poepping updated HIVE-14373: --- Attachment: HIVE-14373.05.patch I'll open a new review-board request for this, as I can't update the old one. This patch takes what Abdullah had and: * creates an abstraction in the CliDrivers for code reuse * allows the QTEST_LEAVE_FILES environment variable implemented in HIVE-8100 to be used to optionally leave files in S3 for inspection and debugging * abstracts the test.blobstore.path to the conf.xml file, so it doesn't need to be set each time * implements a unique folder identifier for each test run, so if multiple people run tests against the same blobstore path at the same time, there will be no collisions > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Thomas Poepping > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513905#comment-15513905 ] Hive QA commented on HIVE-9423: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829872/HIVE-9423.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1274/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1274/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1274/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829872 - PreCommit-HIVE-Build > HiveServer2: Provide the user with different error messages depending on the > Thrift client exception code > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, > HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Poepping updated HIVE-14373: --- Status: In Progress (was: Patch Available) > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Thomas Poepping > Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, > HIVE-14373.04.patch, HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13348) Add Event Nullification support for Replication
[ https://issues.apache.org/jira/browse/HIVE-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513804#comment-15513804 ] Sushanth Sowmyan commented on HIVE-13348: - Removing gsoc tag, as this was not proceeded on for gsoc. > Add Event Nullification support for Replication > --- > > Key: HIVE-13348 > URL: https://issues.apache.org/jira/browse/HIVE-13348 > Project: Hive > Issue Type: Sub-task > Components: Import/Export >Reporter: Sushanth Sowmyan > > Replication, as implemented by HIVE-7973 works as follows: > a) For every singly modification to the hive metastore, an event gets > triggered that logs a notification object. > b) Replication tools such as falcon can consume these notification objects as > a HCatReplicationTaskIterator from > HCatClient.getReplicationTasks(lastEventId, maxEvents, dbName, tableName). > c) For each event, we generate statements and distcp requirements for falcon > to export, distcp and import to do the replication (along with requisite > changes to export and import that would allow state management). > The big thing missing from this picture is that while it works, it is pretty > dumb about how it works in that it will exhaustively process every single > event generated, and will try to do the export-distcp-import cycle for all > modifications, irrespective of whether or not that will actually get used at > import time. > We need to build some sort of filtering logic which can process a batch of > events to identify events that will result in effective no-ops, and to > nullify those events from the stream before passing them on. The goal is to > minimize the number of events that the tools like Falcon would actually have > to process. > Examples of cases where event nullification would take place: > a) CREATE-DROP cases: If an object is being created in event#34 that will > eventually get dropped in event#47, then there is no point in replicating > this along. We simply null out both these events, and also, any other event > that references this object between event#34 and event#47. > b) APPEND-APPEND : Some objects are replicated wholesale, which means every > APPEND that occurs would cause a full export of the object in question. At > this point, the prior APPENDS would all be supplanted by the last APPEND. > Thus, we could nullify all the prior such events. > Additional such cases can be inferred by analysis of the Export-Import relay > protocol definition at > https://issues.apache.org/jira/secure/attachment/12725999/EXIMReplicationReplayProtocol.pdf > or by reasoning out various event processing orders possible. > Replication, as implemented by HIVE-7973 is merely a first step for > functional support. This work is needed for replication to be efficient at > all, and thus, usable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13348) Add Event Nullification support for Replication
[ https://issues.apache.org/jira/browse/HIVE-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-13348: Labels: (was: gsoc2016) > Add Event Nullification support for Replication > --- > > Key: HIVE-13348 > URL: https://issues.apache.org/jira/browse/HIVE-13348 > Project: Hive > Issue Type: Sub-task > Components: Import/Export >Reporter: Sushanth Sowmyan > > Replication, as implemented by HIVE-7973 works as follows: > a) For every singly modification to the hive metastore, an event gets > triggered that logs a notification object. > b) Replication tools such as falcon can consume these notification objects as > a HCatReplicationTaskIterator from > HCatClient.getReplicationTasks(lastEventId, maxEvents, dbName, tableName). > c) For each event, we generate statements and distcp requirements for falcon > to export, distcp and import to do the replication (along with requisite > changes to export and import that would allow state management). > The big thing missing from this picture is that while it works, it is pretty > dumb about how it works in that it will exhaustively process every single > event generated, and will try to do the export-distcp-import cycle for all > modifications, irrespective of whether or not that will actually get used at > import time. > We need to build some sort of filtering logic which can process a batch of > events to identify events that will result in effective no-ops, and to > nullify those events from the stream before passing them on. The goal is to > minimize the number of events that the tools like Falcon would actually have > to process. > Examples of cases where event nullification would take place: > a) CREATE-DROP cases: If an object is being created in event#34 that will > eventually get dropped in event#47, then there is no point in replicating > this along. We simply null out both these events, and also, any other event > that references this object between event#34 and event#47. > b) APPEND-APPEND : Some objects are replicated wholesale, which means every > APPEND that occurs would cause a full export of the object in question. At > this point, the prior APPENDS would all be supplanted by the last APPEND. > Thus, we could nullify all the prior such events. > Additional such cases can be inferred by analysis of the Export-Import relay > protocol definition at > https://issues.apache.org/jira/secure/attachment/12725999/EXIMReplicationReplayProtocol.pdf > or by reasoning out various event processing orders possible. > Replication, as implemented by HIVE-7973 is merely a first step for > functional support. This work is needed for replication to be efficient at > all, and thus, usable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14805) Subquery inside a view will have the object in the subquery as the direct input
[ https://issues.apache.org/jira/browse/HIVE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513748#comment-15513748 ] Aihua Xu commented on HIVE-14805: - Those tests are not related. > Subquery inside a view will have the object in the subquery as the direct > input > > > Key: HIVE-14805 > URL: https://issues.apache.org/jira/browse/HIVE-14805 > Project: Hive > Issue Type: Bug > Components: Views >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14805.1.patch, HIVE-14805.2.patch > > > Here is the repro steps. > {noformat} > create table t1(col string); > create view v1 as select * from t1; > create view dataview as select * from (select * from v1) v2; > select * from dataview; > {noformat} > If hive is configured with authorization hook like Sentry, it will require > the access not only for dataview but also for v1, which should not be > required. > The subquery seems to not carry insideview property from the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14805) Subquery inside a view will have the object in the subquery as the direct input
[ https://issues.apache.org/jira/browse/HIVE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513745#comment-15513745 ] Hive QA commented on HIVE-14805: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829855/HIVE-14805.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1273/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1273/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1273/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829855 - PreCommit-HIVE-Build > Subquery inside a view will have the object in the subquery as the direct > input > > > Key: HIVE-14805 > URL: https://issues.apache.org/jira/browse/HIVE-14805 > Project: Hive > Issue Type: Bug > Components: Views >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14805.1.patch, HIVE-14805.2.patch > > > Here is the repro steps. > {noformat} > create table t1(col string); > create view v1 as select * from t1; > create view dataview as select * from (select * from v1) v2; > select * from dataview; > {noformat} > If hive is configured with authorization hook like Sentry, it will require > the access not only for dataview but also for v1, which should not be > required. > The subquery seems to not carry insideview property from the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14582) Add trunc(numeric) udf
[ https://issues.apache.org/jira/browse/HIVE-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513700#comment-15513700 ] Ashutosh Chauhan commented on HIVE-14582: - I second [~niklaus.xiao] . Overloading existing trunc should be possible and is much more desired. Any other name will deviate us from sql standard. > Add trunc(numeric) udf > -- > > Key: HIVE-14582 > URL: https://issues.apache.org/jira/browse/HIVE-14582 > Project: Hive > Issue Type: Sub-task > Components: SQL >Reporter: Ashutosh Chauhan >Assignee: Chinna Rao Lalam > Attachments: HIVE-14582.patch > > > https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14426) Extensive logging on info level in WebHCat
[ https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513658#comment-15513658 ] Peter Vary commented on HIVE-14426: --- Same errors as in HIVE-14098 + plus some new: {code} 162d161 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver 168d166 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby2 170d167 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown 193d189 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_skewjoin 195,196d190 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_1 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_tests 199d192 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1 203,210d195 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_3 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_4 < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_udf < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_multi_or_projection < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_varchar < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_math_funcs < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp < org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_ints_casts 270c255 {code} The new one is related to this (https://builds.apache.org/job/PreCommit-HIVE-Build/1272/testReport/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/org_apache_hadoop_hive_cli_TestMiniTezCliDriver/) My guess is there was an error in one of the executors, since the TestMiniTezCliDriver was running on another instances. (for example: https://builds.apache.org/job/PreCommit-HIVE-Build/1272/testReport/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/testCliDriver_acid_globallimit/) So all-in-all I think none of the errors are related. Thanks, Peter > Extensive logging on info level in WebHCat > -- > > Key: HIVE-14426 > URL: https://issues.apache.org/jira/browse/HIVE-14426 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14426.2.patch, HIVE-14426.3.patch, > HIVE-14426.4.patch, HIVE-14426.5.patch, HIVE-14426.6.patch, > HIVE-14426.7.patch, HIVE-14426.8.patch, HIVE-14426.9-branch-2.1.patch, > HIVE-14426.9.patch, HIVE-14426.patch > > > There is an extensive logging in WebHCat at info level, and even some > sensitive information could be logged -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513653#comment-15513653 ] Vihang Karajgaonkar commented on HIVE-9423: --- Thanks for the patch [~pvary]. This issue has been a pain point for beeline users and more user-friendly error messages helps a lot. Patch looks good to me. > HiveServer2: Provide the user with different error messages depending on the > Thrift client exception code > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, > HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513528#comment-15513528 ] Xuefu Zhang commented on HIVE-14029: +1 on identifying the minimum set. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, > HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. > To update Spark version to 2.0.0, the following changes are required: > * Spark API updates: > ** SparkShuffler#call return Iterator instead of Iterable > ** SparkListener -> JavaSparkListener > ** InputMetrics constructor doesn’t accept readMethod > ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics > return long type instead of integer > * Dependency upgrade: > ** Jackson: 2.4.2 -> 2.6.5 > ** Netty version: 4.0.23.Final -> 4.0.29.Final > ** Scala binary version: 2.10 -> 2.11 > ** Scala version: 2.10.4 -> 2.11.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14426) Extensive logging on info level in WebHCat
[ https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513599#comment-15513599 ] Hive QA commented on HIVE-14426: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829802/HIVE-14426.9-branch-2.1.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 269 failed/errored test(s), 10355 tests executed *Failed tests:* {noformat} 249_TestHWISessionManager - did not produce a TEST-*.xml file 382_TestMsgBusConnection - did not produce a TEST-*.xml file 772_TestHiveDruidQueryBasedInputFormat - did not produce a TEST-*.xml file 773_TestDruidSerDe - did not produce a TEST-*.xml file 783_TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file 784_TestJdbcWithMiniKdc - did not produce a TEST-*.xml file 785_TestHs2HooksWithMiniKdc - did not produce a TEST-*.xml file 787_TestJdbcWithDBTokenStore - did not produce a TEST-*.xml file 788_TestJdbcWithMiniKdcCookie - did not produce a TEST-*.xml file 789_TestJdbcNonKrbSASLWithMiniKdc - did not produce a TEST-*.xml file 791_TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_outer_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fouter_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_map_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_json_serde1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
[jira] [Updated] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-9423: - Summary: HiveServer2: Provide the user with different error messages depending on the Thrift client exception code (was: HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted) > HiveServer2: Provide the user with different error messages depending on the > Thrift client exception code > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, > HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513552#comment-15513552 ] Xuefu Zhang commented on HIVE-9423: --- Could we update the JIRA title to reflect what the patch is actually doing? Thanks. > HiveServer2: Implement some admission control mechanism for graceful > degradation when resources are exhausted > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, > HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-9423: - Attachment: HIVE-9423.4.patch Changed the messages according to [~ctang.ma] suggestion. Thanks, Peter > HiveServer2: Implement some admission control mechanism for graceful > degradation when resources are exhausted > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, > HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513539#comment-15513539 ] Rui Li commented on HIVE-14029: --- I'm using: {noformat} java version "1.8.0_91" Java(TM) SE Runtime Environment (build 1.8.0_91-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode) {noformat} > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, > HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. > To update Spark version to 2.0.0, the following changes are required: > * Spark API updates: > ** SparkShuffler#call return Iterator instead of Iterable > ** SparkListener -> JavaSparkListener > ** InputMetrics constructor doesn’t accept readMethod > ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics > return long type instead of integer > * Dependency upgrade: > ** Jackson: 2.4.2 -> 2.6.5 > ** Netty version: 4.0.23.Final -> 4.0.29.Final > ** Scala binary version: 2.10 -> 2.11 > ** Scala version: 2.10.4 -> 2.11.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513534#comment-15513534 ] Sergio Peña commented on HIVE-14029: Which JDK you're using? Jenkins is using JDK8 > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, > HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. > To update Spark version to 2.0.0, the following changes are required: > * Spark API updates: > ** SparkShuffler#call return Iterator instead of Iterable > ** SparkListener -> JavaSparkListener > ** InputMetrics constructor doesn’t accept readMethod > ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics > return long type instead of integer > * Dependency upgrade: > ** Jackson: 2.4.2 -> 2.6.5 > ** Netty version: 4.0.23.Final -> 4.0.29.Final > ** Scala binary version: 2.10 -> 2.11 > ** Scala version: 2.10.4 -> 2.11.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513473#comment-15513473 ] Rui Li commented on HIVE-14029: --- Seems we have two {{javax.ws.rs.core.UriInfo}} interfaces from two jars: javax.ws.rs-api and jersey-core. Before the patch, we only have one from jersey-core. Maybe there's some conflicts in the dependency upgrade. We need to fix it because it breaks build. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, > HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. > To update Spark version to 2.0.0, the following changes are required: > * Spark API updates: > ** SparkShuffler#call return Iterator instead of Iterable > ** SparkListener -> JavaSparkListener > ** InputMetrics constructor doesn’t accept readMethod > ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics > return long type instead of integer > * Dependency upgrade: > ** Jackson: 2.4.2 -> 2.6.5 > ** Netty version: 4.0.23.Final -> 4.0.29.Final > ** Scala binary version: 2.10 -> 2.11 > ** Scala version: 2.10.4 -> 2.11.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513451#comment-15513451 ] Chaoyu Tang commented on HIVE-9423: --- +1 > HiveServer2: Implement some admission control mechanism for graceful > degradation when resources are exhausted > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513430#comment-15513430 ] Sergio Peña commented on HIVE-14029: Sorry Fer, I meant 2.2 :P. I got confused with numbers. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, > HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. > To update Spark version to 2.0.0, the following changes are required: > * Spark API updates: > ** SparkShuffler#call return Iterator instead of Iterable > ** SparkListener -> JavaSparkListener > ** InputMetrics constructor doesn’t accept readMethod > ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics > return long type instead of integer > * Dependency upgrade: > ** Jackson: 2.4.2 -> 2.6.5 > ** Netty version: 4.0.23.Final -> 4.0.29.Final > ** Scala binary version: 2.10 -> 2.11 > ** Scala version: 2.10.4 -> 2.11.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14582) Add trunc(numeric) udf
[ https://issues.apache.org/jira/browse/HIVE-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513395#comment-15513395 ] Niklaus Xiao commented on HIVE-14582: - Is it possible to add {{trunc(number)}} logic to the existing {{trunc(date)}} implement ? > Add trunc(numeric) udf > -- > > Key: HIVE-14582 > URL: https://issues.apache.org/jira/browse/HIVE-14582 > Project: Hive > Issue Type: Sub-task > Components: SQL >Reporter: Ashutosh Chauhan >Assignee: Chinna Rao Lalam > Attachments: HIVE-14582.patch > > > https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14754) Track the queries execution lifecycle times
[ https://issues.apache.org/jira/browse/HIVE-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-14754: --- Component/s: (was: Metastore) > Track the queries execution lifecycle times > --- > > Key: HIVE-14754 > URL: https://issues.apache.org/jira/browse/HIVE-14754 > Project: Hive > Issue Type: Sub-task > Components: Hive, HiveServer2 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > We should be able to track the nr. of queries being compiled/executed at any > given time, as well as the duration of the execution and compilation phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14754) Track the queries execution lifecycle times
[ https://issues.apache.org/jira/browse/HIVE-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara reassigned HIVE-14754: -- Assignee: Barna Zsombor Klara > Track the queries execution lifecycle times > --- > > Key: HIVE-14754 > URL: https://issues.apache.org/jira/browse/HIVE-14754 > Project: Hive > Issue Type: Sub-task > Components: Hive, HiveServer2 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > We should be able to track the nr. of queries being compiled/executed at any > given time, as well as the duration of the execution and compilation phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513372#comment-15513372 ] Peter Vary commented on HIVE-9423: -- [~ctang.ma] That is a good question! :) What about this messages: {code} +hs2-unexpected-end-of-file: Unexpected end of file when reading from HS2 server. The root \ +cause might be too many concurrent connections. Please ask the administrator to check the number \ +of active connections, and adjust hive.server2.thrift.max.worker.threads if applicable. +hs2-could-not-open-connection: Could not open connection to the HS2 server. Please check the \ +server URI and if the URI is correct, then ask the administrator to check the server status. {code} Thanks, Peter > HiveServer2: Implement some admission control mechanism for graceful > degradation when resources are exhausted > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14358) Add metrics for number of queries executed for each execution engine (mr, spark, tez)
[ https://issues.apache.org/jira/browse/HIVE-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513350#comment-15513350 ] Barna Zsombor Klara commented on HIVE-14358: Failures seem unrelated, most were failing before, the one test which failed with this run is flaky. > Add metrics for number of queries executed for each execution engine (mr, > spark, tez) > - > > Key: HIVE-14358 > URL: https://issues.apache.org/jira/browse/HIVE-14358 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Affects Versions: 2.1.0 >Reporter: Lenni Kuff >Assignee: Barna Zsombor Klara > Attachments: HIVE-14358.patch > > > HiveServer2 currently has a metric for the total number of queries ran since > last restart, but it would be useful to also have metrics for number of > queries ran for each execution engine. This would improve supportability by > allowing users to get a high-level understanding of what workloads had been > running on the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513336#comment-15513336 ] Chaoyu Tang commented on HIVE-9423: --- [~pvary] I have a small question about the message and suggestion presented to Beeline user when they run into login timeout issue such as {code} +hs2-unexpected-end-of-file: Unexpected end of file when reading from HS2 server. The root \ +cause might be too many concurrent connections. Please check the number of active \ +connections, and adjust hive.server2.thrift.max.worker.threads if applicable. {code} Do you think that these Beeline user has the privilege to "check the number of active connections, and adjust hive.server2.thrift.max.worker.threads if applicable" > HiveServer2: Implement some admission control mechanism for graceful > degradation when resources are exhausted > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14805) Subquery inside a view will have the object in the subquery as the direct input
[ https://issues.apache.org/jira/browse/HIVE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513281#comment-15513281 ] Aihua Xu edited comment on HIVE-14805 at 9/22/16 1:24 PM: -- Patch-2: update 3 tests' baseline. Seems that's also the issue [~niklaus.xiao] mentioned in HIVE-10875. was (Author: aihuaxu): Patch-3: update 3 tests' baseline. Seems that's also the issue [~niklaus.xiao] mentioned in HIVE-10875. > Subquery inside a view will have the object in the subquery as the direct > input > > > Key: HIVE-14805 > URL: https://issues.apache.org/jira/browse/HIVE-14805 > Project: Hive > Issue Type: Bug > Components: Views >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14805.1.patch, HIVE-14805.2.patch > > > Here is the repro steps. > {noformat} > create table t1(col string); > create view v1 as select * from t1; > create view dataview as select * from (select * from v1) v2; > select * from dataview; > {noformat} > If hive is configured with authorization hook like Sentry, it will require > the access not only for dataview but also for v1, which should not be > required. > The subquery seems to not carry insideview property from the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14805) Subquery inside a view will have the object in the subquery as the direct input
[ https://issues.apache.org/jira/browse/HIVE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513294#comment-15513294 ] Yongzhi Chen commented on HIVE-14805: - The PATCH looks good. +1 > Subquery inside a view will have the object in the subquery as the direct > input > > > Key: HIVE-14805 > URL: https://issues.apache.org/jira/browse/HIVE-14805 > Project: Hive > Issue Type: Bug > Components: Views >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14805.1.patch, HIVE-14805.2.patch > > > Here is the repro steps. > {noformat} > create table t1(col string); > create view v1 as select * from t1; > create view dataview as select * from (select * from v1) v2; > select * from dataview; > {noformat} > If hive is configured with authorization hook like Sentry, it will require > the access not only for dataview but also for v1, which should not be > required. > The subquery seems to not carry insideview property from the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)