[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-14029: Attachment: HIVE-14029.1.patch Fix some dependencies issues > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on
[ https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-14783: Status: Patch Available (was: Reopened) > bucketing column should be part of sorting for delete/update operation when > spdo is on > -- > > Key: HIVE-14783 > URL: https://issues.apache.org/jira/browse/HIVE-14783 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Transactions >Affects Versions: 2.2.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 2.2.0 > > Attachments: HIVE-14783.1.patch, HIVE-14783.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on
[ https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-14783: Attachment: HIVE-14783.1.patch > bucketing column should be part of sorting for delete/update operation when > spdo is on > -- > > Key: HIVE-14783 > URL: https://issues.apache.org/jira/browse/HIVE-14783 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Transactions >Affects Versions: 2.2.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 2.2.0 > > Attachments: HIVE-14783.1.patch, HIVE-14783.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on
[ https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reopened HIVE-14783: - Missed updating Select operator in Reducer. > bucketing column should be part of sorting for delete/update operation when > spdo is on > -- > > Key: HIVE-14783 > URL: https://issues.apache.org/jira/browse/HIVE-14783 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Transactions >Affects Versions: 2.2.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 2.2.0 > > Attachments: HIVE-14783.1.patch, HIVE-14783.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505429#comment-15505429 ] Sergio Peña edited comment on HIVE-14793 at 9/20/16 3:13 AM: - Thanks [~sseth]. I couple of comments 1. Can we create a new function that checks and/or initializes environment variables? I think this would be useful for new devs when looking at what config variables can be used. 2. --outputDir is not necessary anymore. I fixed the test results issue in HIVE-14790, and also in the jenkins job config. was (Author: spena): Thanks [~sseth]. I couple of comments > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505429#comment-15505429 ] Sergio Peña commented on HIVE-14793: Thanks [~sseth]. I couple of comments > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14624) LLAP: Use FQDN when submitting work to LLAP
[ https://issues.apache.org/jira/browse/HIVE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505376#comment-15505376 ] Hive QA commented on HIVE-14624: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829283/HIVE-14624.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1235/testReport Console output: https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1235/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/jenkins-PreCommit-HIVE-Build-1235/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829283 - jenkins-PreCommit-HIVE-Build > LLAP: Use FQDN when submitting work to LLAP > > > Key: HIVE-14624 > URL: https://issues.apache.org/jira/browse/HIVE-14624 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > Attachments: HIVE-14624.01.patch, HIVE-14624.02.patch, > HIVE-14624.03.patch, HIVE-14624.patch > > > {code} > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > + socketAddress.getHostName()); > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > host = socketAddress.getHostName(); > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: > public static String getHostName() { > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: >return InetAddress.getLocalHost().getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > String name = address.getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > builder.setAmHost(address.getHostName()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: >nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), > localAddress.get().getPort()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > localAddress.get().getHostName(), vertex.getDagName(), > qIdProto.getDagIndex(), > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > new ExecutionContextImpl(localAddress.get().getHostName()), env, > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: >String hostName = MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java: > .setBindAddress(addr.getHostName()) > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java: > request.getContainerIdString(), executionContext.getHostName(), > vertex.getDagName(), > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >String displayName = "LlapDaemonCacheMetrics-" + > MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName(); > llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java: > new LlapProtocolClientImpl(new Configuration(), > serverAddr.getHostName(), > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java: > builder.setAmHost(getAddress().getHostName()); > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java: > String displayName = "LlapTaskSchedulerMetrics-" + > MetricsUtils.getHostName(); > {code} > In systems where the hostnames do not match FQDN, calling the > getCanonicalHostName() will allow for resolution of the hostname
[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"
[ https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505363#comment-15505363 ] Rui Li commented on HIVE-14714: --- Hi [~gszadovszky], in that case, how about logging some brief message in DEBUG level? I'm just wary to swallow any exceptions. [~xuefuz], do you have any thoughts on this? > Finishing Hive on Spark causes "java.io.IOException: Stream closed" > --- > > Key: HIVE-14714 > URL: https://issues.apache.org/jira/browse/HIVE-14714 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.1.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky > Attachments: HIVE-14714.2.patch, HIVE-14714.patch > > > After execute hive command with Spark, finishing the beeline session or > even switch the engine causes IOException. The following executed Ctrl-D to > finish the session but "!quit" or even "set hive.execution.engine=mr;" causes > the issue. > From HS2 log: > {code} > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote > driver, interrupting... > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [Driver]: Waiting thread interrupted, killing child process. > 2016-09-06 16:15:12,296 WARN org.apache.hive.spark.client.SparkClientImpl: > [stderr-redir-1]: Error in redirector thread. > java.io.IOException: Stream closed > at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:154) > at java.io.BufferedReader.readLine(BufferedReader.java:317) > at java.io.BufferedReader.readLine(BufferedReader.java:382) > at > org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.
[ https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505237#comment-15505237 ] Hive QA commented on HIVE-14792: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829295/HIVE-14792.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1234/testReport Console output: https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1234/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/jenkins-PreCommit-HIVE-Build-1234/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829295 - jenkins-PreCommit-HIVE-Build > AvroSerde reads the remote schema-file at least once per mapper, per table > reference. > - > > Key: HIVE-14792 > URL: https://issues.apache.org/jira/browse/HIVE-14792 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-14792.1.patch > > > Avro tables that use "external" schema files stored on HDFS can cause > excessive calls to {{FileSystem::open()}}, especially for queries that spawn > large numbers of mappers. > This is because of the following code in {{AvroSerDe::initialize()}}: > {code:title=AvroSerDe.java|borderStyle=solid} > public void initialize(Configuration configuration, Properties properties) > throws SerDeException { > // ... > if (hasExternalSchema(properties) > || columnNameProperty == null || columnNameProperty.isEmpty() > || columnTypeProperty == null || columnTypeProperty.isEmpty()) { > schema = determineSchemaOrReturnErrorSchema(configuration, properties); > } else { > // Get column names and sort order > columnNames = Arrays.asList(columnNameProperty.split(",")); > columnTypes = > TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); > schema = getSchemaFromCols(properties, columnNames, columnTypes, > columnCommentProperty); > > properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), > schema.toString()); > } > // ... > } > {code} > For tables using {{avro.schema.url}}, every time the SerDe is initialized > (i.e. at least once per mapper), the schema file is read remotely. For > queries with thousands of mappers, this leads to a stampede to the handful > (3?) datanodes that host the schema-file. In the best case, this causes > slowdowns. > It would be preferable to distribute the Avro-schema to all mappers as part > of the job-conf. The alternatives aren't exactly appealing: > # One can't rely solely on the {{column.list.types}} stored in the Hive > metastore. (HIVE-14789). > # {{avro.schema.literal}} might not always be usable, because of the > size-limit on table-parameters. The typical size of the Avro-schema file is > between 0.5-3MB, in my limited experience. Bumping the max table-parameter > size isn't a great solution. > If the {{avro.schema.file}} were read during query-planning, and made > available as part of table-properties (but not serialized into the > metastore), the downstream logic will remain largely intact. I have a patch > that does this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14779) make DbTxnManager.HeartbeaterThread a daemon
[ https://issues.apache.org/jira/browse/HIVE-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14779: -- Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Thanks Alan for the review > make DbTxnManager.HeartbeaterThread a daemon > > > Key: HIVE-14779 > URL: https://issues.apache.org/jira/browse/HIVE-14779 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14779.patch > > > setDaemon(true); > make heartbeaterThreadPoolSize static -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
[ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505144#comment-15505144 ] Shannon Ladymon commented on HIVE-7926: --- [~sershe], thanks for clarifying that! I had a few other things I'd like to reword but am not quite sure how. I'd appreciate any light you can shed on them: * In the sentence: “The initial stage of the query is pushed into #LLAP, large shuffle is performed in their own containers” - What does "their own containers" refer to? Is there only one large shuffle, or multiple shuffles? * In the sentence: "The node allows parallel execution for multiple query fragments from different queries and sessions” - what does "the node" refer to? A single LLAP node? [~asears], I noticed that the last link on the page, titled ["Try Hive LLAP" | http://www.lewuathe.com/blog/2015/08/12/try-hive-llap/] is a broken link. Should I delete that link from the page, or is there an updated link you'd like to add? Also, the Web Services section is currently blank. Should this section be deleted, or is there content you intend to add? > long-lived daemons for query fragment execution, I/O and caching > > > Key: HIVE-7926 > URL: https://issues.apache.org/jira/browse/HIVE-7926 > Project: Hive > Issue Type: New Feature >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: LLAPdesigndocument.pdf > > > We are proposing a new execution model for Hive that is a combination of > existing process-based tasks and long-lived daemons running on worker nodes. > These nodes can take care of efficient I/O, caching and query fragment > execution, while heavy lifting like most joins, ordering, etc. can be handled > by tasks. > The proposed model is not a 2-system solution for small and large queries; > neither it is a separate execution engine like MR or Tez. It can be used by > any Hive execution engine, if support is added; in future even external > products (e.g. Pig) can use it. > The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14779) make DbTxnManager.HeartbeaterThread a daemon
[ https://issues.apache.org/jira/browse/HIVE-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505075#comment-15505075 ] Alan Gates commented on HIVE-14779: --- +1, see comments above. > make DbTxnManager.HeartbeaterThread a daemon > > > Key: HIVE-14779 > URL: https://issues.apache.org/jira/browse/HIVE-14779 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Attachments: HIVE-14779.patch > > > setDaemon(true); > make heartbeaterThreadPoolSize static -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14779) make DbTxnManager.HeartbeaterThread a daemon
[ https://issues.apache.org/jira/browse/HIVE-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505073#comment-15505073 ] Alan Gates commented on HIVE-14779: --- Ok, so I agree this doesn't make the situation any worse, and in fact makes it better since it will avoid the situation where everything dies but the heartbeat thread and that thread keeps the VM alive and keeps heartbeating. At some point in the future I do think we should solve the issue where a hanging CLI client could hang the system. > make DbTxnManager.HeartbeaterThread a daemon > > > Key: HIVE-14779 > URL: https://issues.apache.org/jira/browse/HIVE-14779 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Attachments: HIVE-14779.patch > > > setDaemon(true); > make heartbeaterThreadPoolSize static -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.
[ https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-14792: Status: Patch Available (was: Open) Submitting, to run tests. > AvroSerde reads the remote schema-file at least once per mapper, per table > reference. > - > > Key: HIVE-14792 > URL: https://issues.apache.org/jira/browse/HIVE-14792 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0, 1.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-14792.1.patch > > > Avro tables that use "external" schema files stored on HDFS can cause > excessive calls to {{FileSystem::open()}}, especially for queries that spawn > large numbers of mappers. > This is because of the following code in {{AvroSerDe::initialize()}}: > {code:title=AvroSerDe.java|borderStyle=solid} > public void initialize(Configuration configuration, Properties properties) > throws SerDeException { > // ... > if (hasExternalSchema(properties) > || columnNameProperty == null || columnNameProperty.isEmpty() > || columnTypeProperty == null || columnTypeProperty.isEmpty()) { > schema = determineSchemaOrReturnErrorSchema(configuration, properties); > } else { > // Get column names and sort order > columnNames = Arrays.asList(columnNameProperty.split(",")); > columnTypes = > TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); > schema = getSchemaFromCols(properties, columnNames, columnTypes, > columnCommentProperty); > > properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), > schema.toString()); > } > // ... > } > {code} > For tables using {{avro.schema.url}}, every time the SerDe is initialized > (i.e. at least once per mapper), the schema file is read remotely. For > queries with thousands of mappers, this leads to a stampede to the handful > (3?) datanodes that host the schema-file. In the best case, this causes > slowdowns. > It would be preferable to distribute the Avro-schema to all mappers as part > of the job-conf. The alternatives aren't exactly appealing: > # One can't rely solely on the {{column.list.types}} stored in the Hive > metastore. (HIVE-14789). > # {{avro.schema.literal}} might not always be usable, because of the > size-limit on table-parameters. The typical size of the Avro-schema file is > between 0.5-3MB, in my limited experience. Bumping the max table-parameter > size isn't a great solution. > If the {{avro.schema.file}} were read during query-planning, and made > available as part of table-properties (but not serialized into the > metastore), the downstream logic will remain largely intact. I have a patch > that does this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14794) HCatalog support to pre-fetch schema for Avro tables that use avro.schema.url.
[ https://issues.apache.org/jira/browse/HIVE-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-14794: Attachment: HIVE-14794.1.patch This patch builds on HIVE-14792. It uses {{SpecialCases}} to prefetch Avro schema. > HCatalog support to pre-fetch schema for Avro tables that use avro.schema.url. > -- > > Key: HIVE-14794 > URL: https://issues.apache.org/jira/browse/HIVE-14794 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-14794.1.patch > > > HIVE-14792 introduces support to modify and add properties to > table-parameters during query-planning. It prefetches remote Avro-schema > information and stores it in TBLPROPERTIES, under {{avro.schema.literal}}. > We'll need similar support in {{HCatLoader}} to prevent excessive reads of > schema-files in Pig queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14794) HCatalog support to pre-fetch schema for Avro tables that use avro.schema.url.
[ https://issues.apache.org/jira/browse/HIVE-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-14794: Summary: HCatalog support to pre-fetch schema for Avro tables that use avro.schema.url. (was: HCatalog support to pre-fetch for Avro tables that use avro.schema.url.) > HCatalog support to pre-fetch schema for Avro tables that use avro.schema.url. > -- > > Key: HIVE-14794 > URL: https://issues.apache.org/jira/browse/HIVE-14794 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > HIVE-14792 introduces support to modify and add properties to > table-parameters during query-planning. It prefetches remote Avro-schema > information and stores it in TBLPROPERTIES, under {{avro.schema.literal}}. > We'll need similar support in {{HCatLoader}} to prevent excessive reads of > schema-files in Pig queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon
[ https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14790: --- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) I committed this to master without review so that we start seeing the tests results on Jenkins. > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon > --- > > Key: HIVE-14790 > URL: https://issues.apache.org/jira/browse/HIVE-14790 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 2.2.0 > > Attachments: HIVE-14790.1.patch > > > NO PRECOMMIT TESTS > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14341) Altered skewed location is not respected for list bucketing
[ https://issues.apache.org/jira/browse/HIVE-14341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505009#comment-15505009 ] Hive QA commented on HIVE-14341: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829272/HIVE-14341.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10555 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_skewed_table] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_skewed_table1] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_list_bucket] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_2] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_4] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_5] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_6] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_7] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_8] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[list_bucket_dml_2] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1233/testReport Console output: https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1233/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/jenkins-PreCommit-HIVE-Build-1233/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829272 - jenkins-PreCommit-HIVE-Build > Altered skewed location is not respected for list bucketing > --- > > Key: HIVE-14341 > URL: https://issues.apache.org/jira/browse/HIVE-14341 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14341.1.patch, HIVE-14341.2.patch > > > CREATE TABLE list_bucket_single (key STRING, value STRING) > SKEWED BY (key) ON (1,5,6) STORED AS DIRECTORIES; > alter table list_bucket_single set skewed location > (''1"="/user/hive/warehouse/hdfs_skewed/new1"); > While when you insert a row to key 1, the location falls back to the default > one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.
[ https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-14792: Attachment: HIVE-14792.1.patch > AvroSerde reads the remote schema-file at least once per mapper, per table > reference. > - > > Key: HIVE-14792 > URL: https://issues.apache.org/jira/browse/HIVE-14792 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-14792.1.patch > > > Avro tables that use "external" schema files stored on HDFS can cause > excessive calls to {{FileSystem::open()}}, especially for queries that spawn > large numbers of mappers. > This is because of the following code in {{AvroSerDe::initialize()}}: > {code:title=AvroSerDe.java|borderStyle=solid} > public void initialize(Configuration configuration, Properties properties) > throws SerDeException { > // ... > if (hasExternalSchema(properties) > || columnNameProperty == null || columnNameProperty.isEmpty() > || columnTypeProperty == null || columnTypeProperty.isEmpty()) { > schema = determineSchemaOrReturnErrorSchema(configuration, properties); > } else { > // Get column names and sort order > columnNames = Arrays.asList(columnNameProperty.split(",")); > columnTypes = > TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); > schema = getSchemaFromCols(properties, columnNames, columnTypes, > columnCommentProperty); > > properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), > schema.toString()); > } > // ... > } > {code} > For tables using {{avro.schema.url}}, every time the SerDe is initialized > (i.e. at least once per mapper), the schema file is read remotely. For > queries with thousands of mappers, this leads to a stampede to the handful > (3?) datanodes that host the schema-file. In the best case, this causes > slowdowns. > It would be preferable to distribute the Avro-schema to all mappers as part > of the job-conf. The alternatives aren't exactly appealing: > # One can't rely solely on the {{column.list.types}} stored in the Hive > metastore. (HIVE-14789). > # {{avro.schema.literal}} might not always be usable, because of the > size-limit on table-parameters. The typical size of the Avro-schema file is > between 0.5-3MB, in my limited experience. Bumping the max table-parameter > size isn't a great solution. > If the {{avro.schema.file}} were read during query-planning, and made > available as part of table-properties (but not serialized into the > metastore), the downstream logic will remain largely intact. I have a patch > that does this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.
[ https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-14792: Attachment: (was: HIVE-14792.1.patch) > AvroSerde reads the remote schema-file at least once per mapper, per table > reference. > - > > Key: HIVE-14792 > URL: https://issues.apache.org/jira/browse/HIVE-14792 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > Avro tables that use "external" schema files stored on HDFS can cause > excessive calls to {{FileSystem::open()}}, especially for queries that spawn > large numbers of mappers. > This is because of the following code in {{AvroSerDe::initialize()}}: > {code:title=AvroSerDe.java|borderStyle=solid} > public void initialize(Configuration configuration, Properties properties) > throws SerDeException { > // ... > if (hasExternalSchema(properties) > || columnNameProperty == null || columnNameProperty.isEmpty() > || columnTypeProperty == null || columnTypeProperty.isEmpty()) { > schema = determineSchemaOrReturnErrorSchema(configuration, properties); > } else { > // Get column names and sort order > columnNames = Arrays.asList(columnNameProperty.split(",")); > columnTypes = > TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); > schema = getSchemaFromCols(properties, columnNames, columnTypes, > columnCommentProperty); > > properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), > schema.toString()); > } > // ... > } > {code} > For tables using {{avro.schema.url}}, every time the SerDe is initialized > (i.e. at least once per mapper), the schema file is read remotely. For > queries with thousands of mappers, this leads to a stampede to the handful > (3?) datanodes that host the schema-file. In the best case, this causes > slowdowns. > It would be preferable to distribute the Avro-schema to all mappers as part > of the job-conf. The alternatives aren't exactly appealing: > # One can't rely solely on the {{column.list.types}} stored in the Hive > metastore. (HIVE-14789). > # {{avro.schema.literal}} might not always be usable, because of the > size-limit on table-parameters. The typical size of the Avro-schema file is > between 0.5-3MB, in my limited experience. Bumping the max table-parameter > size isn't a great solution. > If the {{avro.schema.file}} were read during query-planning, and made > available as part of table-properties (but not serialized into the > metastore), the downstream logic will remain largely intact. I have a patch > that does this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-14793: - Assignee: Siddharth Seth > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14793: -- Attachment: HIVE-14793.01.patch cc [~spena], [~prasanth_j] for review. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth > Attachments: HIVE-14793.01.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589
[ https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14680: Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master. Thanks for the review! > retain consistent splits /during/ (as opposed to across) LLAP failures on top > of HIVE-14589 > --- > > Key: HIVE-14680 > URL: https://issues.apache.org/jira/browse/HIVE-14680 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, > HIVE-14680.03.patch, HIVE-14680.patch > > > see HIVE-14589. > Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) > is to return locations for all slots to HostAffinitySplitLocationProvider, > the missing slots being inactive locations (based solely on the last slot > actually present). For the splits mapped to these locations, fall back via > different hash functions, or some sort of probing. > This still doesn't handle all the cases, namely when the last slots are gone > (consistent hashing is supposed to be good for this?); however for that we'd > need more involved coordination between nodes or a central updater to > indicate the number of nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
[ https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504975#comment-15504975 ] Siddharth Seth commented on HIVE-14734: --- Sure. If you think that's what it is. I'm not sure the outputDir is set correctly either. > Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh > - > > Key: HIVE-14734 > URL: https://issues.apache.org/jira/browse/HIVE-14734 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 2.2.0 > > Attachments: HIVE-14734.2.patch, HIVE-14734.patch > > > NO PRECOMMIT TESTS > Currently, to execute tests on a new branch, a manual process must be done: > 1. Create a new Jenkins job with the new branch name > 2. Create a patch to jenkins-submit-build.sh with the new branch > 3. Create a profile properties file on the ptest master with the new branch > This jira will attempt to automate steps 1 and 2 by detecting the branch > profile from a patch to test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.
[ https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-14792: Description: Avro tables that use "external" schema files stored on HDFS can cause excessive calls to {{FileSystem::open()}}, especially for queries that spawn large numbers of mappers. This is because of the following code in {{AvroSerDe::initialize()}}: {code:title=AvroSerDe.java|borderStyle=solid} public void initialize(Configuration configuration, Properties properties) throws SerDeException { // ... if (hasExternalSchema(properties) || columnNameProperty == null || columnNameProperty.isEmpty() || columnTypeProperty == null || columnTypeProperty.isEmpty()) { schema = determineSchemaOrReturnErrorSchema(configuration, properties); } else { // Get column names and sort order columnNames = Arrays.asList(columnNameProperty.split(",")); columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); schema = getSchemaFromCols(properties, columnNames, columnTypes, columnCommentProperty); properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), schema.toString()); } // ... } {code} For tables using {{avro.schema.url}}, every time the SerDe is initialized (i.e. at least once per mapper), the schema file is read remotely. For queries with thousands of mappers, this leads to a stampede to the handful (3?) datanodes that host the schema-file. In the best case, this causes slowdowns. It would be preferable to distribute the Avro-schema to all mappers as part of the job-conf. The alternatives aren't exactly appealing: # One can't rely solely on the {{column.list.types}} stored in the Hive metastore. (HIVE-14789). # {{avro.schema.literal}} might not always be usable, because of the size-limit on table-parameters. The typical size of the Avro-schema file is between 0.5-3MB, in my limited experience. Bumping the max table-parameter size isn't a great solution. If the {{avro.schema.file}} were read during query-planning, and made available as part of table-properties (but not serialized into the metastore), the downstream logic will remain largely intact. I have a patch that does this. was: Avro tables that use "external" schema files stored on HDFS can cause excessive calls to {{FileSystem::open()}}, especially for queries that spawn large numbers of mappers. This is because of the following code in {{AvroSerDe::initialize()}}: {code:title=AvroSerDe.java|borderStyle=solid} public void initialize(Configuration configuration, Properties properties) throws SerDeException { // ... if (hasExternalSchema(properties) || columnNameProperty == null || columnNameProperty.isEmpty() || columnTypeProperty == null || columnTypeProperty.isEmpty()) { schema = determineSchemaOrReturnErrorSchema(configuration, properties); } else { // Get column names and sort order columnNames = Arrays.asList(columnNameProperty.split(",")); columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); schema = getSchemaFromCols(properties, columnNames, columnTypes, columnCommentProperty); properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), schema.toString()); } // ... } {code} For files using {{avro.schema.url}}, every time the SerDe is initialized (i.e. at least once per mapper), the schema file is read remotely. For queries with thousands of mappers, this leads to a stampede to the handful (3?) datanodes that host the schema-file. In the best case, this causes slowdowns. It would be preferable to distribute the Avro-schema to all mappers as part of the job-conf. The alternatives aren't exactly appealing: # One can't rely solely on the {{column.list.types}} stored in the Hive metastore. (HIVE-14789). # {{avro.schema.literal}} might not always be usable, because of the size-limit on table-parameters. The typical size of the Avro-schema file is between 0.5-3MB, in my limited experience. Bumping the max table-parameter size isn't a great solution. If the {{avro.schema.file}} were read during query-planning, and made available as part of table-properties (but not serialized into the metastore), the downstream logic will remain largely intact. I have a patch that does this. > AvroSerde reads the remote schema-file at least once per mapper, per table > reference. > - > > Key: HIVE-14792 > URL: https://issues.apache.org/jira/browse/HIVE-14792 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-14792.1.pa
[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
[ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504961#comment-15504961 ] Sergey Shelukhin commented on HIVE-7926: That part was not actually done by me, I think it might be a twitter-like tag prefix ;) It can be ignored/dropped. > long-lived daemons for query fragment execution, I/O and caching > > > Key: HIVE-7926 > URL: https://issues.apache.org/jira/browse/HIVE-7926 > Project: Hive > Issue Type: New Feature >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: LLAPdesigndocument.pdf > > > We are proposing a new execution model for Hive that is a combination of > existing process-based tasks and long-lived daemons running on worker nodes. > These nodes can take care of efficient I/O, caching and query fragment > execution, while heavy lifting like most joins, ordering, etc. can be handled > by tasks. > The proposed model is not a 2-system solution for small and large queries; > neither it is a separate execution engine like MR or Tez. It can be used by > any Hive execution engine, if support is added; in future even external > products (e.g. Pig) can use it. > The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.
[ https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-14792: Attachment: HIVE-14792.1.patch This patch introduces an optimizer that prefetches the {{avro.schema.url}} contents, and modifies the table-info stored in the query-plan to contain the schema (as the {{avro.schema.literal}} property). The {{AvroSerDe}} is almost completely unchanged, and handles {{avro.schema.literal}} transparently. > AvroSerde reads the remote schema-file at least once per mapper, per table > reference. > - > > Key: HIVE-14792 > URL: https://issues.apache.org/jira/browse/HIVE-14792 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-14792.1.patch > > > Avro tables that use "external" schema files stored on HDFS can cause > excessive calls to {{FileSystem::open()}}, especially for queries that spawn > large numbers of mappers. > This is because of the following code in {{AvroSerDe::initialize()}}: > {code:title=AvroSerDe.java|borderStyle=solid} > public void initialize(Configuration configuration, Properties properties) > throws SerDeException { > // ... > if (hasExternalSchema(properties) > || columnNameProperty == null || columnNameProperty.isEmpty() > || columnTypeProperty == null || columnTypeProperty.isEmpty()) { > schema = determineSchemaOrReturnErrorSchema(configuration, properties); > } else { > // Get column names and sort order > columnNames = Arrays.asList(columnNameProperty.split(",")); > columnTypes = > TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); > schema = getSchemaFromCols(properties, columnNames, columnTypes, > columnCommentProperty); > > properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), > schema.toString()); > } > // ... > } > {code} > For files using {{avro.schema.url}}, every time the SerDe is initialized > (i.e. at least once per mapper), the schema file is read remotely. For > queries with thousands of mappers, this leads to a stampede to the handful > (3?) datanodes that host the schema-file. In the best case, this causes > slowdowns. > It would be preferable to distribute the Avro-schema to all mappers as part > of the job-conf. The alternatives aren't exactly appealing: > # One can't rely solely on the {{column.list.types}} stored in the Hive > metastore. (HIVE-14789). > # {{avro.schema.literal}} might not always be usable, because of the > size-limit on table-parameters. The typical size of the Avro-schema file is > between 0.5-3MB, in my limited experience. Bumping the max table-parameter > size isn't a great solution. > If the {{avro.schema.file}} were read during query-planning, and made > available as part of table-properties (but not serialized into the > metastore), the downstream logic will remain largely intact. I have a patch > that does this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
[ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504935#comment-15504935 ] Shannon Ladymon commented on HIVE-7926: --- [~sershe], can you clarify why the # is used in #LLAP in the [design doc | https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf]? > long-lived daemons for query fragment execution, I/O and caching > > > Key: HIVE-7926 > URL: https://issues.apache.org/jira/browse/HIVE-7926 > Project: Hive > Issue Type: New Feature >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: LLAPdesigndocument.pdf > > > We are proposing a new execution model for Hive that is a combination of > existing process-based tasks and long-lived daemons running on worker nodes. > These nodes can take care of efficient I/O, caching and query fragment > execution, while heavy lifting like most joins, ordering, etc. can be handled > by tasks. > The proposed model is not a 2-system solution for small and large queries; > neither it is a separate execution engine like MR or Tez. It can be used by > any Hive execution engine, if support is added; in future even external > products (e.g. Pig) can use it. > The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14700) clean up file/txn information via a metastore thread
[ https://issues.apache.org/jira/browse/HIVE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14700: Resolution: Fixed Status: Resolved (was: Patch Available) Committed to the feature branch. > clean up file/txn information via a metastore thread > > > Key: HIVE-14700 > URL: https://issues.apache.org/jira/browse/HIVE-14700 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: hive-14535 > > Attachments: HIVE-14700.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14700) clean up file/txn information via a metastore thread
[ https://issues.apache.org/jira/browse/HIVE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14700: Summary: clean up file/txn information via a metastore thread (was: clean up file/txn information via a metastore thread similar to compactor) > clean up file/txn information via a metastore thread > > > Key: HIVE-14700 > URL: https://issues.apache.org/jira/browse/HIVE-14700 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: hive-14535 > > Attachments: HIVE-14700.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14791) LLAP: Use FQDN when submitting work to LLAP
[ https://issues.apache.org/jira/browse/HIVE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14791: Summary: LLAP: Use FQDN when submitting work to LLAP (was: LLAP: Use FQDN for all communication ) > LLAP: Use FQDN when submitting work to LLAP > > > Key: HIVE-14791 > URL: https://issues.apache.org/jira/browse/HIVE-14791 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > > {code} > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > + socketAddress.getHostName()); > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > host = socketAddress.getHostName(); > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: > public static String getHostName() { > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: >return InetAddress.getLocalHost().getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > String name = address.getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > builder.setAmHost(address.getHostName()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: >nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), > localAddress.get().getPort()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > localAddress.get().getHostName(), vertex.getDagName(), > qIdProto.getDagIndex(), > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > new ExecutionContextImpl(localAddress.get().getHostName()), env, > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: >String hostName = MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java: > .setBindAddress(addr.getHostName()) > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java: > request.getContainerIdString(), executionContext.getHostName(), > vertex.getDagName(), > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >String displayName = "LlapDaemonCacheMetrics-" + > MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName(); > llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java: > new LlapProtocolClientImpl(new Configuration(), > serverAddr.getHostName(), > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java: > builder.setAmHost(getAddress().getHostName()); > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java: > String displayName = "LlapTaskSchedulerMetrics-" + > MetricsUtils.getHostName(); > {code} > In systems where the hostnames do not match FQDN, calling the > getCanonicalHostName() will allow for resolution of the hostname when > accessing from a different base domain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14624) LLAP: Use FQDN when submitting work to LLAP
[ https://issues.apache.org/jira/browse/HIVE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14624: Summary: LLAP: Use FQDN when submitting work to LLAP (was: LLAP: Use FQDN for all communication ) > LLAP: Use FQDN when submitting work to LLAP > > > Key: HIVE-14624 > URL: https://issues.apache.org/jira/browse/HIVE-14624 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > Attachments: HIVE-14624.01.patch, HIVE-14624.02.patch, > HIVE-14624.03.patch, HIVE-14624.patch > > > {code} > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > + socketAddress.getHostName()); > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > host = socketAddress.getHostName(); > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: > public static String getHostName() { > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: >return InetAddress.getLocalHost().getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > String name = address.getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > builder.setAmHost(address.getHostName()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: >nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), > localAddress.get().getPort()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > localAddress.get().getHostName(), vertex.getDagName(), > qIdProto.getDagIndex(), > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > new ExecutionContextImpl(localAddress.get().getHostName()), env, > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: >String hostName = MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java: > .setBindAddress(addr.getHostName()) > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java: > request.getContainerIdString(), executionContext.getHostName(), > vertex.getDagName(), > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >String displayName = "LlapDaemonCacheMetrics-" + > MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName(); > llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java: > new LlapProtocolClientImpl(new Configuration(), > serverAddr.getHostName(), > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java: > builder.setAmHost(getAddress().getHostName()); > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java: > String displayName = "LlapTaskSchedulerMetrics-" + > MetricsUtils.getHostName(); > {code} > In systems where the hostnames do not match FQDN, calling the > getCanonicalHostName() will allow for resolution of the hostname when > accessing from a different base domain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14791) LLAP: Use FQDN for all communication
[ https://issues.apache.org/jira/browse/HIVE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14791: Summary: LLAP: Use FQDN for all communication (was: LLAP: Use FQDN when submitting work to LLAP ) > LLAP: Use FQDN for all communication > > > Key: HIVE-14791 > URL: https://issues.apache.org/jira/browse/HIVE-14791 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > > {code} > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > + socketAddress.getHostName()); > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > host = socketAddress.getHostName(); > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: > public static String getHostName() { > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: >return InetAddress.getLocalHost().getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > String name = address.getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > builder.setAmHost(address.getHostName()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: >nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), > localAddress.get().getPort()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > localAddress.get().getHostName(), vertex.getDagName(), > qIdProto.getDagIndex(), > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > new ExecutionContextImpl(localAddress.get().getHostName()), env, > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: >String hostName = MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java: > .setBindAddress(addr.getHostName()) > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java: > request.getContainerIdString(), executionContext.getHostName(), > vertex.getDagName(), > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >String displayName = "LlapDaemonCacheMetrics-" + > MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName(); > llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java: > new LlapProtocolClientImpl(new Configuration(), > serverAddr.getHostName(), > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java: > builder.setAmHost(getAddress().getHostName()); > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java: > String displayName = "LlapTaskSchedulerMetrics-" + > MetricsUtils.getHostName(); > {code} > In systems where the hostnames do not match FQDN, calling the > getCanonicalHostName() will allow for resolution of the hostname when > accessing from a different base domain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14624) LLAP: Use FQDN for all communication
[ https://issues.apache.org/jira/browse/HIVE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14624: Attachment: HIVE-14624.03.patch unit test failure is a test-specific issue with a mock. Updated. [~sseth] will clone and rename this one since all the discussion has been here > LLAP: Use FQDN for all communication > - > > Key: HIVE-14624 > URL: https://issues.apache.org/jira/browse/HIVE-14624 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > Attachments: HIVE-14624.01.patch, HIVE-14624.02.patch, > HIVE-14624.03.patch, HIVE-14624.patch > > > {code} > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > + socketAddress.getHostName()); > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > host = socketAddress.getHostName(); > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: > public static String getHostName() { > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: >return InetAddress.getLocalHost().getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > String name = address.getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > builder.setAmHost(address.getHostName()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: >nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), > localAddress.get().getPort()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > localAddress.get().getHostName(), vertex.getDagName(), > qIdProto.getDagIndex(), > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > new ExecutionContextImpl(localAddress.get().getHostName()), env, > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: >String hostName = MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java: > .setBindAddress(addr.getHostName()) > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java: > request.getContainerIdString(), executionContext.getHostName(), > vertex.getDagName(), > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >String displayName = "LlapDaemonCacheMetrics-" + > MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName(); > llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java: > new LlapProtocolClientImpl(new Configuration(), > serverAddr.getHostName(), > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java: > builder.setAmHost(getAddress().getHostName()); > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java: > String displayName = "LlapTaskSchedulerMetrics-" + > MetricsUtils.getHostName(); > {code} > In systems where the hostnames do not match FQDN, calling the > getCanonicalHostName() will allow for resolution of the hostname when > accessing from a different base domain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14700) clean up file/txn information via a metastore thread similar to compactor
[ https://issues.apache.org/jira/browse/HIVE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14700: Attachment: (was: HIVE-14700.WIP.patch) > clean up file/txn information via a metastore thread similar to compactor > - > > Key: HIVE-14700 > URL: https://issues.apache.org/jira/browse/HIVE-14700 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: hive-14535 > > Attachments: HIVE-14700.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP
[ https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504870#comment-15504870 ] Sergey Shelukhin commented on HIVE-14651: - lgtm +1 > Add a local cluster for Tez and LLAP > > > Key: HIVE-14651 > URL: https://issues.apache.org/jira/browse/HIVE-14651 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, > HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon
[ https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14790: --- Description: NO PRECOMMIT TESTS Jenkins is not displaying test results because 'set -e' is aborting the script too soon was:Jenkins is not displaying test results because 'set -e' is aborting the script too soon > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon > --- > > Key: HIVE-14790 > URL: https://issues.apache.org/jira/browse/HIVE-14790 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-14790.1.patch > > > NO PRECOMMIT TESTS > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon
[ https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14790: --- Status: Patch Available (was: Open) > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon > --- > > Key: HIVE-14790 > URL: https://issues.apache.org/jira/browse/HIVE-14790 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-14790.1.patch > > > NO PRECOMMIT TESTS > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon
[ https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504848#comment-15504848 ] Sergio Peña commented on HIVE-14790: [~sseth] This is the patch. > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon > --- > > Key: HIVE-14790 > URL: https://issues.apache.org/jira/browse/HIVE-14790 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-14790.1.patch > > > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon
[ https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña reassigned HIVE-14790: -- Assignee: Sergio Peña > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon > --- > > Key: HIVE-14790 > URL: https://issues.apache.org/jira/browse/HIVE-14790 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-14790.1.patch > > > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon
[ https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14790: --- Attachment: HIVE-14790.1.patch > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon > --- > > Key: HIVE-14790 > URL: https://issues.apache.org/jira/browse/HIVE-14790 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña > Attachments: HIVE-14790.1.patch > > > Jenkins is not displaying test results because 'set -e' is aborting the > script too soon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
[ https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504831#comment-15504831 ] Sergio Peña commented on HIVE-14734: [~sseth] I know what it is. This is the last part of the script. {noformat} call_ptest_server --testHandle "$TEST_HANDLE" --endpoint "$PTEST_API_ENDPOINT" --logsEndpoint "$PTEST_LOG_ENDPOINT" \ --profile "$BUILD_PROFILE" ${optionalArgs[@]} "$@" ret=$? unpack_test_results exit $ret {noformat} The {{set -e}} at the beginning of the file is aborting the script when {{call_ptest_server}} returns a non-zero value, and he script does not unpack the test results. I'll create a quick patch to remove this. > Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh > - > > Key: HIVE-14734 > URL: https://issues.apache.org/jira/browse/HIVE-14734 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 2.2.0 > > Attachments: HIVE-14734.2.patch, HIVE-14734.patch > > > NO PRECOMMIT TESTS > Currently, to execute tests on a new branch, a manual process must be done: > 1. Create a new Jenkins job with the new branch name > 2. Create a patch to jenkins-submit-build.sh with the new branch > 3. Create a profile properties file on the ptest master with the new branch > This jira will attempt to automate steps 1 and 2 by detecting the branch > profile from a patch to test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
[ https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504614#comment-15504614 ] Siddharth Seth edited comment on HIVE-14734 at 9/19/16 9:39 PM: [~spena] - test results are no longer available on Jenkins runs. Investigating, but I suspect it may be because of this jira. was (Author: sseth): [~spena] - test results are no longer available on Hadoop. Investigating, but I suspect it may be because of this jira. > Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh > - > > Key: HIVE-14734 > URL: https://issues.apache.org/jira/browse/HIVE-14734 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 2.2.0 > > Attachments: HIVE-14734.2.patch, HIVE-14734.patch > > > NO PRECOMMIT TESTS > Currently, to execute tests on a new branch, a manual process must be done: > 1. Create a new Jenkins job with the new branch name > 2. Create a patch to jenkins-submit-build.sh with the new branch > 3. Create a profile properties file on the ptest master with the new branch > This jira will attempt to automate steps 1 and 2 by detecting the branch > profile from a patch to test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14341) Altered skewed location is not respected for list bucketing
[ https://issues.apache.org/jira/browse/HIVE-14341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14341: Attachment: HIVE-14341.2.patch Patch-2: made the changes so the desc command will show skewed location for those locations not updated explicitly. With this patch, we will not automatically collect the skew mapping from the directory since that would cause the issue if the location is updated explicitly. Rather, given a query like select * from list_bucket_single where key=1, if the skew location for key 1 is updated explicitly, then we will have the new location from HMS, otherwise, we will check the default location /list_bucket_single/key=1. > Altered skewed location is not respected for list bucketing > --- > > Key: HIVE-14341 > URL: https://issues.apache.org/jira/browse/HIVE-14341 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14341.1.patch, HIVE-14341.2.patch > > > CREATE TABLE list_bucket_single (key STRING, value STRING) > SKEWED BY (key) ON (1,5,6) STORED AS DIRECTORIES; > alter table list_bucket_single set skewed location > (''1"="/user/hive/warehouse/hdfs_skewed/new1"); > While when you insert a row to key 1, the location falls back to the default > one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14789) Avro Table-reads bork when using SerDe-generated table-schema.
[ https://issues.apache.org/jira/browse/HIVE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-14789: Attachment: HIVE-14789-reproduce.patch This attachment has a qfile-test that reproduces the error I'm talking about, including a scrubbed data-file that's readable with the schema-literal, but not without it. This was a fairly common failure at Yahoo. Our current recommendation is for users to only use Avro tables with the schema-file with which they were produced. The metastore-based schema is to be ignored entirely. I've already tried modifying how the Avro schema is generated from {{columns.list.types}}, but I find that the conversions (to and fro) are lossy, brittle and unreliable. :/ > Avro Table-reads bork when using SerDe-generated table-schema. > -- > > Key: HIVE-14789 > URL: https://issues.apache.org/jira/browse/HIVE-14789 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 1.2.1, 2.0.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-14789-reproduce.patch > > > AvroSerDe allows one to skip the table-columns in a table-definition when > creating a table, as long as the TBLPROPERTIES includes a valid > {{avro.schema.url}} or {{avro.schema.literal}}. The table-columns are > inferred from processing the Avro schema file/literal. > The problem is that the inferred schema might not be congruent with the > actual schema in the Avro schema file/literal. Consider the following table > definition: > {code:sql} > CREATE TABLE avro_schema_break_1 > ROW FORMAT > SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > TBLPROPERTIES ('avro.schema.literal'='{ > "type": "record", > "name": "Messages", > "namespace": "net.myth", > "fields": [ > { > "name": "header", > "type": [ > "null", > { > "type": "record", > "name": "HeaderInfo", > "fields": [ > { > "name": "inferred_event_type", > "type": [ > "null", > "string" > ], > "default": null > }, > { > "name": "event_type", > "type": [ > "null", > "string" > ], > "default": null > }, > { > "name": "event_version", > "type": [ > "null", > "string" > ], > "default": null > } > ] > } > ] > }, > { > "name": "messages", > "type": { > "type": "array", > "items": { > "name": "MessageInfo", > "type": "record", > "fields": [ > { > "name": "message_id", > "type": [ > "null", > "string" > ], > "doc": "Message-ID" > }, > { > "name": "received_date", > "type": [ > "null", > "long" > ], > "doc": "Received Date" > }, > { > "name": "sent_date", > "type": [ > "null", > "long" > ] > }, > { > "name": "from_name", > "type": [ > "null", > "string" > ] > }, > { > "name": "flags", > "type": [ > "null", > { > "type": "record", > "name": "Flags", > "fields": [ > { > "name": "is_seen", > "type": [ > "null", > "boolean" > ], > "default": null > }, > { > "name": "is_read", > "type": [ > "null", > "boolean" > ], > "default": null > }, > { > "name": "is_flagged", > "type": [ > "null", > "boolean" > ], > "default": null > } >
[jira] [Assigned] (HIVE-14789) Avro Table-reads bork when using SerDe-generated table-schema.
[ https://issues.apache.org/jira/browse/HIVE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan reassigned HIVE-14789: --- Assignee: Mithun Radhakrishnan > Avro Table-reads bork when using SerDe-generated table-schema. > -- > > Key: HIVE-14789 > URL: https://issues.apache.org/jira/browse/HIVE-14789 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 1.2.1, 2.0.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > AvroSerDe allows one to skip the table-columns in a table-definition when > creating a table, as long as the TBLPROPERTIES includes a valid > {{avro.schema.url}} or {{avro.schema.literal}}. The table-columns are > inferred from processing the Avro schema file/literal. > The problem is that the inferred schema might not be congruent with the > actual schema in the Avro schema file/literal. Consider the following table > definition: > {code:sql} > CREATE TABLE avro_schema_break_1 > ROW FORMAT > SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > TBLPROPERTIES ('avro.schema.literal'='{ > "type": "record", > "name": "Messages", > "namespace": "net.myth", > "fields": [ > { > "name": "header", > "type": [ > "null", > { > "type": "record", > "name": "HeaderInfo", > "fields": [ > { > "name": "inferred_event_type", > "type": [ > "null", > "string" > ], > "default": null > }, > { > "name": "event_type", > "type": [ > "null", > "string" > ], > "default": null > }, > { > "name": "event_version", > "type": [ > "null", > "string" > ], > "default": null > } > ] > } > ] > }, > { > "name": "messages", > "type": { > "type": "array", > "items": { > "name": "MessageInfo", > "type": "record", > "fields": [ > { > "name": "message_id", > "type": [ > "null", > "string" > ], > "doc": "Message-ID" > }, > { > "name": "received_date", > "type": [ > "null", > "long" > ], > "doc": "Received Date" > }, > { > "name": "sent_date", > "type": [ > "null", > "long" > ] > }, > { > "name": "from_name", > "type": [ > "null", > "string" > ] > }, > { > "name": "flags", > "type": [ > "null", > { > "type": "record", > "name": "Flags", > "fields": [ > { > "name": "is_seen", > "type": [ > "null", > "boolean" > ], > "default": null > }, > { > "name": "is_read", > "type": [ > "null", > "boolean" > ], > "default": null > }, > { > "name": "is_flagged", > "type": [ > "null", > "boolean" > ], > "default": null > } > ] > } > ], > "default": null > } > ] > } > } > } > ] > }'); > {code} > This produces a table with the following schema: > {noformat} > 2016-09-19T13:23:42,934 DEBUG [0ce7e586-13ea-4390-ac2a-6dac36e8a216 main] > hive.log: DDL: struct avro_schema_break_1 { > struct > header, > list>> > messages} > {noformat} > Data written to this table using the AvroSchema from {{avro.schema.literal}} > using Pig's {{AvroStorage}} cannot be read using Hive using the generated > table schema. This is the exception one sees: > {noforma
[jira] [Commented] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
[ https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504614#comment-15504614 ] Siddharth Seth commented on HIVE-14734: --- [~spena] - test results are no longer available on Hadoop. Investigating, but I suspect it may be because of this jira. > Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh > - > > Key: HIVE-14734 > URL: https://issues.apache.org/jira/browse/HIVE-14734 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 2.2.0 > > Attachments: HIVE-14734.2.patch, HIVE-14734.patch > > > NO PRECOMMIT TESTS > Currently, to execute tests on a new branch, a manual process must be done: > 1. Create a new Jenkins job with the new branch name > 2. Create a patch to jenkins-submit-build.sh with the new branch > 3. Create a profile properties file on the ptest master with the new branch > This jira will attempt to automate steps 1 and 2 by detecting the branch > profile from a patch to test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589
[ https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504607#comment-15504607 ] Hive QA commented on HIVE-14680: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829250/HIVE-14680.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10554 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1232/testReport Console output: https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1232/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/jenkins-PreCommit-HIVE-Build-1232/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829250 - jenkins-PreCommit-HIVE-Build > retain consistent splits /during/ (as opposed to across) LLAP failures on top > of HIVE-14589 > --- > > Key: HIVE-14680 > URL: https://issues.apache.org/jira/browse/HIVE-14680 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, > HIVE-14680.03.patch, HIVE-14680.patch > > > see HIVE-14589. > Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) > is to return locations for all slots to HostAffinitySplitLocationProvider, > the missing slots being inactive locations (based solely on the last slot > actually present). For the splits mapped to these locations, fall back via > different hash functions, or some sort of probing. > This still doesn't handle all the cases, namely when the last slots are gone > (consistent hashing is supposed to be good for this?); however for that we'd > need more involved coordination between nodes or a central updater to > indicate the number of nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14461) Investigate HBaseMinimrCliDriver tests
[ https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504604#comment-15504604 ] Siddharth Seth commented on HIVE-14461: --- Ping [~prasanth_j] for review. > Investigate HBaseMinimrCliDriver tests > -- > > Key: HIVE-14461 > URL: https://issues.apache.org/jira/browse/HIVE-14461 > Project: Hive > Issue Type: Sub-task > Components: Tests >Reporter: Zoltan Haindrich > Attachments: HIVE-14461.01.patch > > > during HIVE-1 i've encountered an odd thing: > HBaseMinimrCliDriver only executes single test...and that test is set using > the qfile selector...which looks a out-of-place. > The only test it executes doesn't follow regular qtest file naming...and has > an extension 'm' > At least the file should be renamedbut I think change wasn't > intentional -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14788) Investigate how to access permanent function with restarting HS2 if load balancer is configured
[ https://issues.apache.org/jira/browse/HIVE-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504515#comment-15504515 ] BELUGA BEHR commented on HIVE-14788: The current option, as I understand it: # Copy target JAR file to each host into Hive auxiliary directory # Restart one HiveServer2 instance # Connect directly to that one HiveServer2 instance # Create the function # Restart the rest of the HiveServer2 instances Restarting the rest of the HiveServer2 instances will cause them to pick up the new JAR file from the auxiliary directory and also re-load the list of functions from the HMS. This can perhaps be improved with a read-through cache for functions, with a timeout for each entry (TTL). When a function is encountered, if it is not in the cache, HS2 can attempt to grab the function information from the HMS. The timeout is important because if a user is to drop a function, that function needs a way to be dropped from the HS2 caches. With this setup, the process for adding a function is simplified: # Copy target JAR file to each host into Hive auxiliary directory # Restart all HiveServer2 instances # Connect to any HiveServer2 instance through the load balancer # Create the function > Investigate how to access permanent function with restarting HS2 if load > balancer is configured > --- > > Key: HIVE-14788 > URL: https://issues.apache.org/jira/browse/HIVE-14788 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Aihua Xu >Assignee: Aihua Xu > > When load balancer is configured for multiple HS2 servers, seems we need to > restart each HS2 server to get permanent function to work. Since the command > "reload function" issued from the client to refresh the global registry may > is not targeted to a specific HS2 server, some servers may not get refreshed > and ClassNotFoundException may be thrown later. > Investigate if it's an issue and a good solution for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13703) "msck repair" on table with non-partition subdirectories reporting partitions not in metastore
[ https://issues.apache.org/jira/browse/HIVE-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504433#comment-15504433 ] Sergey Shelukhin commented on HIVE-13703: - Would this be fixed by HIVE-14511, or otherwise should it use a similar approach (looking for the expected directory structure in the first place, rather than catching errors)? > "msck repair" on table with non-partition subdirectories reporting partitions > not in metastore > -- > > Key: HIVE-13703 > URL: https://issues.apache.org/jira/browse/HIVE-13703 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0, 1.0.0, 1.2.1 >Reporter: Ana Gillan >Assignee: Alina Abramova > Attachments: HIVE-13703.patch > > > PROBLEM: Subdirectories created with UNION ALL are listed in {{show > partitions}} output, but show up as {{Partitions not in metastore}} in {{msck > repair}} output. > STEPS TO REPRODUCE: Table created from {{CTAS ... UNION ALL}} DDL > {code} > hive> msck repair table meter_001; > OK > Partitions not in metastore: meter_001:tech_datestamp=2016-03-09/1 > meter_001:tech_datestamp=2016-03-09/2 meter_001:tech_datestamp=2016-03-10/1 > meter_001:tech_datestamp=2016-03-10/2 meter_001:tech_datestamp=2016-03-11/1 > meter_001:tech_datestamp=2016-03-11/2 meter_001:tech_datestamp=2016-03-12/1 > meter_001:tech_datestamp=2016-03-12/2 meter_001:tech_datestamp=2016-03-13/1 > meter_001:tech_datestamp=2016-03-13/2 meter_001:tech_datestamp=2016-03-14/1 > meter_001:tech_datestamp=2016-03-14/2 meter_001:tech_datestamp=2016-03-15/1 > meter_001:tech_datestamp=2016-03-15/2 meter_001:tech_datestamp=2016-03-16/1 > meter_001:tech_datestamp=2016-03-16/2 meter_001:tech_datestamp=2016-03-17/1 > meter_001:tech_datestamp=2016-03-17/2 meter_001:tech_datestamp=2016-03-18/1 > meter_001:tech_datestamp=2016-03-18/2 meter_001:tech_datestamp=2016-03-19/1 > meter_001:tech_datestamp=2016-03-19/2 meter_001:tech_datestamp=2016-03-20/1 > meter_001:tech_datestamp=2016-03-20/2 meter_001:tech_datestamp=2016-03-21/1 > meter_001:tech_datestamp=2016-03-21/2 meter_001:tech_datestamp=2016-03-22/1 > meter_001:tech_datestamp=2016-03-22/2 meter_001:tech_datestamp=2016-03-23/1 > meter_001:tech_datestamp=2016-03-23/2 meter_001:tech_datestamp=2016-03-24/1 > meter_001:tech_datestamp=2016-03-24/2 meter_001:tech_datestamp=2016-03-25/1 > meter_001:tech_datestamp=2016-03-25/2 meter_001:tech_datestamp=2016-03-26/1 > meter_001:tech_datestamp=2016-03-26/2 meter_001:tech_datestamp=2016-03-27/1 > meter_001:tech_datestamp=2016-03-27/2 meter_001:tech_datestamp=2016-03-28/1 > meter_001:tech_datestamp=2016-03-28/2 meter_001:tech_datestamp=2016-03-29/1 > meter_001:tech_datestamp=2016-03-29/2 meter_001:tech_datestamp=2016-03-30/1 > meter_001:tech_datestamp=2016-03-30/2 meter_001:tech_datestamp=2016-03-31/1 > meter_001:tech_datestamp=2016-03-31/2 meter_001:tech_datestamp=2016-04-01/1 > meter_001:tech_datestamp=2016-04-01/2 meter_001:tech_datestamp=2016-04-02/1 > meter_001:tech_datestamp=2016-04-02/2 meter_001:tech_datestamp=2016-04-03/1 > meter_001:tech_datestamp=2016-04-03/2 meter_001:tech_datestamp=2016-04-04/1 > meter_001:tech_datestamp=2016-04-04/2 meter_001:tech_datestamp=2016-04-05/1 > meter_001:tech_datestamp=2016-04-05/2 meter_001:tech_datestamp=2016-04-06/1 > meter_001:tech_datestamp=2016-04-06/2 > Time taken: 15.996 seconds, Fetched: 1 row(s) > {code} > {code} > hive> show partitions meter_001; > OK > tech_datestamp=2016-03-09 > tech_datestamp=2016-03-10 > tech_datestamp=2016-03-11 > tech_datestamp=2016-03-12 > tech_datestamp=2016-03-13 > tech_datestamp=2016-03-14 > tech_datestamp=2016-03-15 > tech_datestamp=2016-03-16 > tech_datestamp=2016-03-17 > tech_datestamp=2016-03-18 > tech_datestamp=2016-03-19 > tech_datestamp=2016-03-20 > tech_datestamp=2016-03-21 > tech_datestamp=2016-03-22 > tech_datestamp=2016-03-23 > tech_datestamp=2016-03-24 > tech_datestamp=2016-03-25 > tech_datestamp=2016-03-26 > tech_datestamp=2016-03-27 > tech_datestamp=2016-03-28 > tech_datestamp=2016-03-29 > tech_datestamp=2016-03-30 > tech_datestamp=2016-03-31 > tech_datestamp=2016-04-01 > tech_datestamp=2016-04-02 > tech_datestamp=2016-04-03 > tech_datestamp=2016-04-04 > tech_datestamp=2016-04-05 > tech_datestamp=2016-04-06 > Time taken: 0.417 seconds, Fetched: 29 row(s) > {code} > Ideally msck repair should ignore subdirectory if that additional partition > column doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Target Version/s: 1.3.0, 2.1.1, 2.0.2 > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Fix For: 2.2.0 > > Attachments: HIVE-14251.1.patch, HIVE-14251.2.patch, > HIVE-14251.3.patch, HIVE-14251.4.patch, HIVE-14251.5.patch, HIVE-14251.6.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504407#comment-15504407 ] Aihua Xu commented on HIVE-14251: - Yes. It should affect all of them. I will backport to those branches. > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Fix For: 2.2.0 > > Attachments: HIVE-14251.1.patch, HIVE-14251.2.patch, > HIVE-14251.3.patch, HIVE-14251.4.patch, HIVE-14251.5.patch, HIVE-14251.6.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on
[ https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504358#comment-15504358 ] Sergey Shelukhin commented on HIVE-14783: - [~ashutoshc] what is the effect of this bugfix on queries (i.e. what is the user-observable behavior that it fixes)? > bucketing column should be part of sorting for delete/update operation when > spdo is on > -- > > Key: HIVE-14783 > URL: https://issues.apache.org/jira/browse/HIVE-14783 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Transactions >Affects Versions: 2.2.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 2.2.0 > > Attachments: HIVE-14783.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504346#comment-15504346 ] Sergey Shelukhin commented on HIVE-14251: - Does this affect branch-1? Having incorrect results should warrant a fix in all the branches (2.1, 2.0, 1.3?) > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Fix For: 2.2.0 > > Attachments: HIVE-14251.1.patch, HIVE-14251.2.patch, > HIVE-14251.3.patch, HIVE-14251.4.patch, HIVE-14251.5.patch, HIVE-14251.6.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589
[ https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504329#comment-15504329 ] Sergey Shelukhin commented on HIVE-14680: - One-byte change. > retain consistent splits /during/ (as opposed to across) LLAP failures on top > of HIVE-14589 > --- > > Key: HIVE-14680 > URL: https://issues.apache.org/jira/browse/HIVE-14680 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, > HIVE-14680.03.patch, HIVE-14680.patch > > > see HIVE-14589. > Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) > is to return locations for all slots to HostAffinitySplitLocationProvider, > the missing slots being inactive locations (based solely on the last slot > actually present). For the splits mapped to these locations, fall back via > different hash functions, or some sort of probing. > This still doesn't handle all the cases, namely when the last slots are gone > (consistent hashing is supposed to be good for this?); however for that we'd > need more involved coordination between nodes or a central updater to > indicate the number of nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589
[ https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14680: Attachment: HIVE-14680.03.patch > retain consistent splits /during/ (as opposed to across) LLAP failures on top > of HIVE-14589 > --- > > Key: HIVE-14680 > URL: https://issues.apache.org/jira/browse/HIVE-14680 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, > HIVE-14680.03.patch, HIVE-14680.patch > > > see HIVE-14589. > Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) > is to return locations for all slots to HostAffinitySplitLocationProvider, > the missing slots being inactive locations (based solely on the last slot > actually present). For the splits mapped to these locations, fall back via > different hash functions, or some sort of probing. > This still doesn't handle all the cases, namely when the last slots are gone > (consistent hashing is supposed to be good for this?); however for that we'd > need more involved coordination between nodes or a central updater to > indicate the number of nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589
[ https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504324#comment-15504324 ] Sergey Shelukhin commented on HIVE-14680: - [~sseth] I was assuming the normal block/split boundaries were k*large-ish power of two, so this would suffice. Apparently there's no such restriction. +-3 can affect another bit, however if we make no assumptions about split boundaries, we cannot tell which way the 3 goes (e.g. for 31323, we don't know if it has to be consistent with 31320 or 31326). I guess we can just remove an extra bit. > retain consistent splits /during/ (as opposed to across) LLAP failures on top > of HIVE-14589 > --- > > Key: HIVE-14680 > URL: https://issues.apache.org/jira/browse/HIVE-14680 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, > HIVE-14680.patch > > > see HIVE-14589. > Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) > is to return locations for all slots to HostAffinitySplitLocationProvider, > the missing slots being inactive locations (based solely on the last slot > actually present). For the splits mapped to these locations, fall back via > different hash functions, or some sort of probing. > This still doesn't handle all the cases, namely when the last slots are gone > (consistent hashing is supposed to be good for this?); however for that we'd > need more involved coordination between nodes or a central updater to > indicate the number of nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
[ https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504294#comment-15504294 ] Siddharth Seth commented on HIVE-14734: --- Very useful. Thankyou. https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/ is the new url for the job. (instead of PreCommit-HIVE-master-Build) > Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh > - > > Key: HIVE-14734 > URL: https://issues.apache.org/jira/browse/HIVE-14734 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 2.2.0 > > Attachments: HIVE-14734.2.patch, HIVE-14734.patch > > > NO PRECOMMIT TESTS > Currently, to execute tests on a new branch, a manual process must be done: > 1. Create a new Jenkins job with the new branch name > 2. Create a patch to jenkins-submit-build.sh with the new branch > 3. Create a profile properties file on the ptest master with the new branch > This jira will attempt to automate steps 1 and 2 by detecting the branch > profile from a patch to test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589
[ https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504305#comment-15504305 ] Gopal V commented on HIVE-14680: The off-by value is usually the file magic for the ORC file "ORC" (3 bytes). BISplit will ignore it & do (0+32Mb), the ETLSplit will start at the 1st stripe (3+33.99Mb). This is not expected to happen for any split other than stripe #1 of a file. > retain consistent splits /during/ (as opposed to across) LLAP failures on top > of HIVE-14589 > --- > > Key: HIVE-14680 > URL: https://issues.apache.org/jira/browse/HIVE-14680 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, > HIVE-14680.patch > > > see HIVE-14589. > Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) > is to return locations for all slots to HostAffinitySplitLocationProvider, > the missing slots being inactive locations (based solely on the last slot > actually present). For the splits mapped to these locations, fall back via > different hash functions, or some sort of probing. > This still doesn't handle all the cases, namely when the last slots are gone > (consistent hashing is supposed to be good for this?); however for that we'd > need more involved coordination between nodes or a central updater to > indicate the number of nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589
[ https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504287#comment-15504287 ] Siddharth Seth commented on HIVE-14680: --- bq. As for removing the 2 lowest bits, yes Let me clarify the question. Block boundary is 30MB. Split via read-footers generates the split start 30MB + 3 bytes. Split without reading footer generates the start-offset as 30MB. Will removing the 2 lower bits provide the same start offset for both splits. Otherwise these splits are not consistent, and will not go to the same node. 30MB is probably a bad example. Will this work in all cases (32MB -2 bytes). > retain consistent splits /during/ (as opposed to across) LLAP failures on top > of HIVE-14589 > --- > > Key: HIVE-14680 > URL: https://issues.apache.org/jira/browse/HIVE-14680 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, > HIVE-14680.patch > > > see HIVE-14589. > Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) > is to return locations for all slots to HostAffinitySplitLocationProvider, > the missing slots being inactive locations (based solely on the last slot > actually present). For the splits mapped to these locations, fall back via > different hash functions, or some sort of probing. > This still doesn't handle all the cases, namely when the last slots are gone > (consistent hashing is supposed to be good for this?); however for that we'd > need more involved coordination between nodes or a central updater to > indicate the number of nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP
[ https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504277#comment-15504277 ] Prasanth Jayachandran commented on HIVE-14651: -- +1 > Add a local cluster for Tez and LLAP > > > Key: HIVE-14651 > URL: https://issues.apache.org/jira/browse/HIVE-14651 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, > HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP
[ https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504276#comment-15504276 ] Prasanth Jayachandran commented on HIVE-14651: -- That's the one. Makes sense. > Add a local cluster for Tez and LLAP > > > Key: HIVE-14651 > URL: https://issues.apache.org/jira/browse/HIVE-14651 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, > HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP
[ https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504272#comment-15504272 ] Siddharth Seth commented on HIVE-14651: --- Which comment are you referring to? (The one which says "Force fs to file://, setup staging dir?) The staging dir is setup within the Hive directories, otherwise Tez takes care of creating the dirs based on configuration. Once local mode works properly, these comments will get resolved (I'd like to allow local mode to work with either file:// or hdfs. Tez does not support hdfs with local mode yet) > Add a local cluster for Tez and LLAP > > > Key: HIVE-14651 > URL: https://issues.apache.org/jira/browse/HIVE-14651 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, > HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14781) ptest killall command does not work
[ https://issues.apache.org/jira/browse/HIVE-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14781: -- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master. Thanks for the review [~prasanth_j] > ptest killall command does not work > --- > > Key: HIVE-14781 > URL: https://issues.apache.org/jira/browse/HIVE-14781 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.2.0 > > Attachments: HIVE-14781.01.patch > > > killall -f is not a valid flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP
[ https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504223#comment-15504223 ] Prasanth Jayachandran commented on HIVE-14651: -- [~sseth] regarding your comments in the code. how does it work without setting up staging dir? Is there a default that tez AM sets up? > Add a local cluster for Tez and LLAP > > > Key: HIVE-14651 > URL: https://issues.apache.org/jira/browse/HIVE-14651 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, > HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP
[ https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504202#comment-15504202 ] Siddharth Seth commented on HIVE-14651: --- [~prasanth_j], [~sershe] - the test results look good now. Any other comments? > Add a local cluster for Tez and LLAP > > > Key: HIVE-14651 > URL: https://issues.apache.org/jira/browse/HIVE-14651 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, > HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on
[ https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-14783: Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Pushed to master. > bucketing column should be part of sorting for delete/update operation when > spdo is on > -- > > Key: HIVE-14783 > URL: https://issues.apache.org/jira/browse/HIVE-14783 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Transactions >Affects Versions: 2.2.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 2.2.0 > > Attachments: HIVE-14783.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on
[ https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504005#comment-15504005 ] Prasanth Jayachandran commented on HIVE-14783: -- +1 > bucketing column should be part of sorting for delete/update operation when > spdo is on > -- > > Key: HIVE-14783 > URL: https://issues.apache.org/jira/browse/HIVE-14783 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Transactions >Affects Versions: 2.2.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14783.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14784) Operation logs are disabled automatically if the parent directory does not exist.
[ https://issues.apache.org/jira/browse/HIVE-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503888#comment-15503888 ] Yongzhi Chen commented on HIVE-14784: - Had a discussion with Naveen, He will add warning related to old log files no long exist. > Operation logs are disabled automatically if the parent directory does not > exist. > - > > Key: HIVE-14784 > URL: https://issues.apache.org/jira/browse/HIVE-14784 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-14784.patch > > > Operation logging is disabled automatically for the query if for some reason > the parent directory (named after the hive session id) that gets created when > the session is established gets deleted (for any reason). For ex: if the > operation logdir is /tmp which automatically can get purged at a configured > interval by the OS. > Running a query from that session leads to > {code} > 2016-09-15 15:09:16,723 WARN org.apache.hive.service.cli.operation.Operation: > Unable to create operation log file: > /tmp/hive/operation_logs/b8809985-6b38-47ec-a49b-6158a67cd9fc/d35414f7-2418-426c-8489-c6f643ca4599 > java.io.IOException: No such file or directory > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:1012) > at > org.apache.hive.service.cli.operation.Operation.createOperationLog(Operation.java:195) > at > org.apache.hive.service.cli.operation.Operation.beforeRun(Operation.java:237) > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:255) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:398) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:385) > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:490) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > This later leads to errors like (more prominent when using HUE as HUE does > not close hive sessions and attempts to retrieve the operations logs days > after they were created). > {code} > WARN org.apache.hive.service.cli.thrift.ThriftCLIService: Error fetching > results: > org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated > with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, > getHandleIdentifier()=d35414f7-2418-426c-8489-c6f643ca4599] > at > org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:259) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:701) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:676) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(
[jira] [Updated] (HIVE-13703) "msck repair" on table with non-partition subdirectories reporting partitions not in metastore
[ https://issues.apache.org/jira/browse/HIVE-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alina Abramova updated HIVE-13703: -- Attachment: HIVE-13703.patch This patch fixes this issue. Could somebody review it? > "msck repair" on table with non-partition subdirectories reporting partitions > not in metastore > -- > > Key: HIVE-13703 > URL: https://issues.apache.org/jira/browse/HIVE-13703 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0, 1.0.0, 1.2.1 >Reporter: Ana Gillan >Assignee: Alina Abramova > Attachments: HIVE-13703.patch > > > PROBLEM: Subdirectories created with UNION ALL are listed in {{show > partitions}} output, but show up as {{Partitions not in metastore}} in {{msck > repair}} output. > STEPS TO REPRODUCE: Table created from {{CTAS ... UNION ALL}} DDL > {code} > hive> msck repair table meter_001; > OK > Partitions not in metastore: meter_001:tech_datestamp=2016-03-09/1 > meter_001:tech_datestamp=2016-03-09/2 meter_001:tech_datestamp=2016-03-10/1 > meter_001:tech_datestamp=2016-03-10/2 meter_001:tech_datestamp=2016-03-11/1 > meter_001:tech_datestamp=2016-03-11/2 meter_001:tech_datestamp=2016-03-12/1 > meter_001:tech_datestamp=2016-03-12/2 meter_001:tech_datestamp=2016-03-13/1 > meter_001:tech_datestamp=2016-03-13/2 meter_001:tech_datestamp=2016-03-14/1 > meter_001:tech_datestamp=2016-03-14/2 meter_001:tech_datestamp=2016-03-15/1 > meter_001:tech_datestamp=2016-03-15/2 meter_001:tech_datestamp=2016-03-16/1 > meter_001:tech_datestamp=2016-03-16/2 meter_001:tech_datestamp=2016-03-17/1 > meter_001:tech_datestamp=2016-03-17/2 meter_001:tech_datestamp=2016-03-18/1 > meter_001:tech_datestamp=2016-03-18/2 meter_001:tech_datestamp=2016-03-19/1 > meter_001:tech_datestamp=2016-03-19/2 meter_001:tech_datestamp=2016-03-20/1 > meter_001:tech_datestamp=2016-03-20/2 meter_001:tech_datestamp=2016-03-21/1 > meter_001:tech_datestamp=2016-03-21/2 meter_001:tech_datestamp=2016-03-22/1 > meter_001:tech_datestamp=2016-03-22/2 meter_001:tech_datestamp=2016-03-23/1 > meter_001:tech_datestamp=2016-03-23/2 meter_001:tech_datestamp=2016-03-24/1 > meter_001:tech_datestamp=2016-03-24/2 meter_001:tech_datestamp=2016-03-25/1 > meter_001:tech_datestamp=2016-03-25/2 meter_001:tech_datestamp=2016-03-26/1 > meter_001:tech_datestamp=2016-03-26/2 meter_001:tech_datestamp=2016-03-27/1 > meter_001:tech_datestamp=2016-03-27/2 meter_001:tech_datestamp=2016-03-28/1 > meter_001:tech_datestamp=2016-03-28/2 meter_001:tech_datestamp=2016-03-29/1 > meter_001:tech_datestamp=2016-03-29/2 meter_001:tech_datestamp=2016-03-30/1 > meter_001:tech_datestamp=2016-03-30/2 meter_001:tech_datestamp=2016-03-31/1 > meter_001:tech_datestamp=2016-03-31/2 meter_001:tech_datestamp=2016-04-01/1 > meter_001:tech_datestamp=2016-04-01/2 meter_001:tech_datestamp=2016-04-02/1 > meter_001:tech_datestamp=2016-04-02/2 meter_001:tech_datestamp=2016-04-03/1 > meter_001:tech_datestamp=2016-04-03/2 meter_001:tech_datestamp=2016-04-04/1 > meter_001:tech_datestamp=2016-04-04/2 meter_001:tech_datestamp=2016-04-05/1 > meter_001:tech_datestamp=2016-04-05/2 meter_001:tech_datestamp=2016-04-06/1 > meter_001:tech_datestamp=2016-04-06/2 > Time taken: 15.996 seconds, Fetched: 1 row(s) > {code} > {code} > hive> show partitions meter_001; > OK > tech_datestamp=2016-03-09 > tech_datestamp=2016-03-10 > tech_datestamp=2016-03-11 > tech_datestamp=2016-03-12 > tech_datestamp=2016-03-13 > tech_datestamp=2016-03-14 > tech_datestamp=2016-03-15 > tech_datestamp=2016-03-16 > tech_datestamp=2016-03-17 > tech_datestamp=2016-03-18 > tech_datestamp=2016-03-19 > tech_datestamp=2016-03-20 > tech_datestamp=2016-03-21 > tech_datestamp=2016-03-22 > tech_datestamp=2016-03-23 > tech_datestamp=2016-03-24 > tech_datestamp=2016-03-25 > tech_datestamp=2016-03-26 > tech_datestamp=2016-03-27 > tech_datestamp=2016-03-28 > tech_datestamp=2016-03-29 > tech_datestamp=2016-03-30 > tech_datestamp=2016-03-31 > tech_datestamp=2016-04-01 > tech_datestamp=2016-04-02 > tech_datestamp=2016-04-03 > tech_datestamp=2016-04-04 > tech_datestamp=2016-04-05 > tech_datestamp=2016-04-06 > Time taken: 0.417 seconds, Fetched: 29 row(s) > {code} > Ideally msck repair should ignore subdirectory if that additional partition > column doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14568) Hive Decimal Returns NULL
[ https://issues.apache.org/jira/browse/HIVE-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503847#comment-15503847 ] Akhil Chalamalasetty edited comment on HIVE-14568 at 9/19/16 3:58 PM: -- Thanks for the elaborate explanation Zhang. We will workaround this issue by casting the column to a lower precision & scale. Since we have a few developers migrating from ORACLE and Postgres SQL, we thought this would be a feature request to ease the usage of Hive. Please let us know if there is a way to introduce such a mode on Hive and if that would have a any performance impacts once implemented. Regards, Akhil was (Author: akhilnaidu): Thanks Zhang. We will workaround this issue by casting the column to a lower precision & scale. Since we have a few developers migrating from ORACLE and Postgres SQL, we thought this would be a feature request to ease the usage of Hive. Please let us know if there is a way to introduce such a mode on Hive and if that would have a any performance impacts once implemented. Regards, AKhil > Hive Decimal Returns NULL > - > > Key: HIVE-14568 > URL: https://issues.apache.org/jira/browse/HIVE-14568 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.0 > Environment: Centos 6.7, Hadoop 2.7.2,hive 1.0.0,2.0 >Reporter: gurmukh singh >Assignee: Xuefu Zhang > > Hi > I was under the impression that the bug: > https://issues.apache.org/jira/browse/HIVE-5022 got fixed. But, I see the > same issue in Hive 1.0 and hive 1.2 as well. > hive> desc mul_table; > OK > prc decimal(38,28) > vol decimal(38,10) > Time taken: 0.068 seconds, Fetched: 2 row(s) > hive> select prc, vol, prc*vol as cost from mul_table; > OK > 1.2 200 NULL > 1.44 200 NULL > 2.14 100 NULL > 3.004 50 NULL > 1.2 200 NULL > Time taken: 0.048 seconds, Fetched: 5 row(s) > Rather then returning NULL, it should give error or round off. > I understand that, I can use Double instead of decimal or can cast it, but > still returning "Null" will make many things go unnoticed. > hive> desc mul_table2; > OK > prc double > vol decimal(14,10) > Time taken: 0.049 seconds, Fetched: 2 row(s) > hive> select * from mul_table2; > OK > 1.4 200 > 1.34 200 > 7.34 100 > 7454533.354544100 > Time taken: 0.028 seconds, Fetched: 4 row(s) > hive> select prc, vol, prc*vol as cost from mul_table3; > OK > 7.34 100 734.0 > 7.34 10007340.0 > 1.000410001000.4 > 7454533.354544100 7.454533354544E8 <- Wrong result > 7454533.35454410007.454533354544E9 <- Wrong result > Time taken: 0.025 seconds, Fetched: 5 row(s) > Casting: > hive> select prc, vol, cast(prc*vol as decimal(38,38)) as cost from > mul_table3; > OK > 7.34 100 NULL > 7.34 1000NULL > 1.00041000NULL > 7454533.354544100 NULL > 7454533.3545441000NULL > Time taken: 0.033 seconds, Fetched: 5 row(s) > hive> select prc, vol, cast(prc*vol as decimal(38,10)) as cost from > mul_table3; > OK > 7.34 100 734 > 7.34 10007340 > 1.000410001000.4 > 7454533.354544100 745453335.4544 > 7454533.35454410007454533354.544 > Time taken: 0.026 seconds, Fetched: 5 row(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14568) Hive Decimal Returns NULL
[ https://issues.apache.org/jira/browse/HIVE-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503847#comment-15503847 ] Akhil Chalamalasetty commented on HIVE-14568: - Thanks Zhang. We will workaround this issue by casting the column to a lower precision & scale. Since we have a few developers migrating from ORACLE and Postgres SQL, we thought this would be a feature request to ease the usage of Hive. Please let us know if there is a way to introduce such a mode on Hive and if that would have a any performance impacts once implemented. Regards, AKhil > Hive Decimal Returns NULL > - > > Key: HIVE-14568 > URL: https://issues.apache.org/jira/browse/HIVE-14568 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0, 1.2.0 > Environment: Centos 6.7, Hadoop 2.7.2,hive 1.0.0,2.0 >Reporter: gurmukh singh >Assignee: Xuefu Zhang > > Hi > I was under the impression that the bug: > https://issues.apache.org/jira/browse/HIVE-5022 got fixed. But, I see the > same issue in Hive 1.0 and hive 1.2 as well. > hive> desc mul_table; > OK > prc decimal(38,28) > vol decimal(38,10) > Time taken: 0.068 seconds, Fetched: 2 row(s) > hive> select prc, vol, prc*vol as cost from mul_table; > OK > 1.2 200 NULL > 1.44 200 NULL > 2.14 100 NULL > 3.004 50 NULL > 1.2 200 NULL > Time taken: 0.048 seconds, Fetched: 5 row(s) > Rather then returning NULL, it should give error or round off. > I understand that, I can use Double instead of decimal or can cast it, but > still returning "Null" will make many things go unnoticed. > hive> desc mul_table2; > OK > prc double > vol decimal(14,10) > Time taken: 0.049 seconds, Fetched: 2 row(s) > hive> select * from mul_table2; > OK > 1.4 200 > 1.34 200 > 7.34 100 > 7454533.354544100 > Time taken: 0.028 seconds, Fetched: 4 row(s) > hive> select prc, vol, prc*vol as cost from mul_table3; > OK > 7.34 100 734.0 > 7.34 10007340.0 > 1.000410001000.4 > 7454533.354544100 7.454533354544E8 <- Wrong result > 7454533.35454410007.454533354544E9 <- Wrong result > Time taken: 0.025 seconds, Fetched: 5 row(s) > Casting: > hive> select prc, vol, cast(prc*vol as decimal(38,38)) as cost from > mul_table3; > OK > 7.34 100 NULL > 7.34 1000NULL > 1.00041000NULL > 7454533.354544100 NULL > 7454533.3545441000NULL > Time taken: 0.033 seconds, Fetched: 5 row(s) > hive> select prc, vol, cast(prc*vol as decimal(38,10)) as cost from > mul_table3; > OK > 7.34 100 734 > 7.34 10007340 > 1.000410001000.4 > 7454533.354544100 745453335.4544 > 7454533.35454410007454533354.544 > Time taken: 0.026 seconds, Fetched: 5 row(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13703) "msck repair" on table with non-partition subdirectories reporting partitions not in metastore
[ https://issues.apache.org/jira/browse/HIVE-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alina Abramova reassigned HIVE-13703: - Assignee: Alina Abramova > "msck repair" on table with non-partition subdirectories reporting partitions > not in metastore > -- > > Key: HIVE-13703 > URL: https://issues.apache.org/jira/browse/HIVE-13703 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0, 1.0.0, 1.2.1 >Reporter: Ana Gillan >Assignee: Alina Abramova > > PROBLEM: Subdirectories created with UNION ALL are listed in {{show > partitions}} output, but show up as {{Partitions not in metastore}} in {{msck > repair}} output. > STEPS TO REPRODUCE: Table created from {{CTAS ... UNION ALL}} DDL > {code} > hive> msck repair table meter_001; > OK > Partitions not in metastore: meter_001:tech_datestamp=2016-03-09/1 > meter_001:tech_datestamp=2016-03-09/2 meter_001:tech_datestamp=2016-03-10/1 > meter_001:tech_datestamp=2016-03-10/2 meter_001:tech_datestamp=2016-03-11/1 > meter_001:tech_datestamp=2016-03-11/2 meter_001:tech_datestamp=2016-03-12/1 > meter_001:tech_datestamp=2016-03-12/2 meter_001:tech_datestamp=2016-03-13/1 > meter_001:tech_datestamp=2016-03-13/2 meter_001:tech_datestamp=2016-03-14/1 > meter_001:tech_datestamp=2016-03-14/2 meter_001:tech_datestamp=2016-03-15/1 > meter_001:tech_datestamp=2016-03-15/2 meter_001:tech_datestamp=2016-03-16/1 > meter_001:tech_datestamp=2016-03-16/2 meter_001:tech_datestamp=2016-03-17/1 > meter_001:tech_datestamp=2016-03-17/2 meter_001:tech_datestamp=2016-03-18/1 > meter_001:tech_datestamp=2016-03-18/2 meter_001:tech_datestamp=2016-03-19/1 > meter_001:tech_datestamp=2016-03-19/2 meter_001:tech_datestamp=2016-03-20/1 > meter_001:tech_datestamp=2016-03-20/2 meter_001:tech_datestamp=2016-03-21/1 > meter_001:tech_datestamp=2016-03-21/2 meter_001:tech_datestamp=2016-03-22/1 > meter_001:tech_datestamp=2016-03-22/2 meter_001:tech_datestamp=2016-03-23/1 > meter_001:tech_datestamp=2016-03-23/2 meter_001:tech_datestamp=2016-03-24/1 > meter_001:tech_datestamp=2016-03-24/2 meter_001:tech_datestamp=2016-03-25/1 > meter_001:tech_datestamp=2016-03-25/2 meter_001:tech_datestamp=2016-03-26/1 > meter_001:tech_datestamp=2016-03-26/2 meter_001:tech_datestamp=2016-03-27/1 > meter_001:tech_datestamp=2016-03-27/2 meter_001:tech_datestamp=2016-03-28/1 > meter_001:tech_datestamp=2016-03-28/2 meter_001:tech_datestamp=2016-03-29/1 > meter_001:tech_datestamp=2016-03-29/2 meter_001:tech_datestamp=2016-03-30/1 > meter_001:tech_datestamp=2016-03-30/2 meter_001:tech_datestamp=2016-03-31/1 > meter_001:tech_datestamp=2016-03-31/2 meter_001:tech_datestamp=2016-04-01/1 > meter_001:tech_datestamp=2016-04-01/2 meter_001:tech_datestamp=2016-04-02/1 > meter_001:tech_datestamp=2016-04-02/2 meter_001:tech_datestamp=2016-04-03/1 > meter_001:tech_datestamp=2016-04-03/2 meter_001:tech_datestamp=2016-04-04/1 > meter_001:tech_datestamp=2016-04-04/2 meter_001:tech_datestamp=2016-04-05/1 > meter_001:tech_datestamp=2016-04-05/2 meter_001:tech_datestamp=2016-04-06/1 > meter_001:tech_datestamp=2016-04-06/2 > Time taken: 15.996 seconds, Fetched: 1 row(s) > {code} > {code} > hive> show partitions meter_001; > OK > tech_datestamp=2016-03-09 > tech_datestamp=2016-03-10 > tech_datestamp=2016-03-11 > tech_datestamp=2016-03-12 > tech_datestamp=2016-03-13 > tech_datestamp=2016-03-14 > tech_datestamp=2016-03-15 > tech_datestamp=2016-03-16 > tech_datestamp=2016-03-17 > tech_datestamp=2016-03-18 > tech_datestamp=2016-03-19 > tech_datestamp=2016-03-20 > tech_datestamp=2016-03-21 > tech_datestamp=2016-03-22 > tech_datestamp=2016-03-23 > tech_datestamp=2016-03-24 > tech_datestamp=2016-03-25 > tech_datestamp=2016-03-26 > tech_datestamp=2016-03-27 > tech_datestamp=2016-03-28 > tech_datestamp=2016-03-29 > tech_datestamp=2016-03-30 > tech_datestamp=2016-03-31 > tech_datestamp=2016-04-01 > tech_datestamp=2016-04-02 > tech_datestamp=2016-04-03 > tech_datestamp=2016-04-04 > tech_datestamp=2016-04-05 > tech_datestamp=2016-04-06 > Time taken: 0.417 seconds, Fetched: 29 row(s) > {code} > Ideally msck repair should ignore subdirectory if that additional partition > column doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14098) Logging task properties, and environment variables might contain passwords
[ https://issues.apache.org/jira/browse/HIVE-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503747#comment-15503747 ] Hive QA commented on HIVE-14098: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829192/HIVE-14098.2-branch-2.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 253 failed/errored test(s), 10352 tests executed *Failed tests:* {noformat} 249_TestHWISessionManager - did not produce a TEST-*.xml file 382_TestMsgBusConnection - did not produce a TEST-*.xml file 771_TestHiveDruidQueryBasedInputFormat - did not produce a TEST-*.xml file 772_TestDruidSerDe - did not produce a TEST-*.xml file 782_TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file 783_TestJdbcWithMiniKdc - did not produce a TEST-*.xml file 784_TestHs2HooksWithMiniKdc - did not produce a TEST-*.xml file 786_TestJdbcWithDBTokenStore - did not produce a TEST-*.xml file 787_TestJdbcWithMiniKdcCookie - did not produce a TEST-*.xml file 788_TestJdbcNonKrbSASLWithMiniKdc - did not produce a TEST-*.xml file 790_TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_outer_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fouter_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_map_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_json_serde1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_b
[jira] [Updated] (HIVE-14186) Display the UDF exception message in MapReduce in beeline console
[ https://issues.apache.org/jira/browse/HIVE-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14186: Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks Yongzhi for reviewing. > Display the UDF exception message in MapReduce in beeline console > -- > > Key: HIVE-14186 > URL: https://issues.apache.org/jira/browse/HIVE-14186 > Project: Hive > Issue Type: Improvement > Components: Beeline >Reporter: Aihua Xu >Assignee: Aihua Xu > Fix For: 2.2.0 > > Attachments: HIVE-14186.1.patch > > > Currently when Mapper or Reducer fails, the beeline console will print the > following error. > Error: Error while processing statement: FAILED: Execution Error, return code > 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2) > It would be helpful if we can print the exceptions from the mapreduce to the > beeline console directly so you don't need to dig into the MR log to find it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2496) Allow ALTER TABLE RENAME between schemas
[ https://issues.apache.org/jira/browse/HIVE-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503603#comment-15503603 ] Ian Cook commented on HIVE-2496: I believe this issue was resolved in Hive 0.14.0, so this issue should be closed. > Allow ALTER TABLE RENAME between schemas > > > Key: HIVE-2496 > URL: https://issues.apache.org/jira/browse/HIVE-2496 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Patrick Angeles > Attachments: HIVE-2496.1.patch, HIVE-2496.2.patch > > > Currently, this is not allowed which is unfortunate: > ALTER TABLE db1.foo RENAME TO db2.foo ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"
[ https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503484#comment-15503484 ] Gabor Szadovszky commented on HIVE-14714: - Thanks a lot for the hint, [~lirui]. The fix of [HIVE-13895] should solve the waiting problem. However, in case of child.waitFor() is interrupted and the related process still generates some output the IOException in the redirector threads would be logged. (It might occur if the related spark configs are modified.) I think, these exceptions might be misleading. So, I would do a minimal modification to swallow these IOExceptions in case we are about to stop the remote driver (isAlive is false). What do you think? > Finishing Hive on Spark causes "java.io.IOException: Stream closed" > --- > > Key: HIVE-14714 > URL: https://issues.apache.org/jira/browse/HIVE-14714 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.1.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky > Attachments: HIVE-14714.2.patch, HIVE-14714.patch > > > After execute hive command with Spark, finishing the beeline session or > even switch the engine causes IOException. The following executed Ctrl-D to > finish the session but "!quit" or even "set hive.execution.engine=mr;" causes > the issue. > From HS2 log: > {code} > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote > driver, interrupting... > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [Driver]: Waiting thread interrupted, killing child process. > 2016-09-06 16:15:12,296 WARN org.apache.hive.spark.client.SparkClientImpl: > [stderr-redir-1]: Error in redirector thread. > java.io.IOException: Stream closed > at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:154) > at java.io.BufferedReader.readLine(BufferedReader.java:317) > at java.io.BufferedReader.readLine(BufferedReader.java:382) > at > org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14753) Track the number of open/closed/abandoned sessions in HS2
[ https://issues.apache.org/jira/browse/HIVE-14753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503496#comment-15503496 ] Barna Zsombor Klara commented on HIVE-14753: Posted on rb: https://reviews.apache.org/r/52029/ > Track the number of open/closed/abandoned sessions in HS2 > - > > Key: HIVE-14753 > URL: https://issues.apache.org/jira/browse/HIVE-14753 > Project: Hive > Issue Type: Sub-task > Components: Hive, HiveServer2 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > We should be able to track the nr. of sessions since the startup of the HS2 > instance as well as the average lifetime of a session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14098) Logging task properties, and environment variables might contain passwords
[ https://issues.apache.org/jira/browse/HIVE-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-14098: -- Attachment: HIVE-14098.2-branch-2.1.patch Just for QA test, no real change > Logging task properties, and environment variables might contain passwords > -- > > Key: HIVE-14098 > URL: https://issues.apache.org/jira/browse/HIVE-14098 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Logging, Spark >Affects Versions: 2.1.0 >Reporter: Peter Vary >Assignee: Peter Vary > Fix For: 2.2.0 > > Attachments: HIVE-14098-branch-2.1.patch, > HIVE-14098.2-branch-2.1.patch, HIVE-14098.2.patch, HIVE-14098.patch > > > Hive MapredLocalTask Can Print Environment Passwords, like > -Djavax.net.ssl.trustStorePassword. > The same could happen, when logging spark properties -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14487) Add REBUILD statement for materialized views
[ https://issues.apache.org/jira/browse/HIVE-14487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14487: --- Assignee: (was: Alan Gates) > Add REBUILD statement for materialized views > > > Key: HIVE-14487 > URL: https://issues.apache.org/jira/browse/HIVE-14487 > Project: Hive > Issue Type: Sub-task > Components: Materialized views >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez > > Support for rebuilding existing materialized views. The statement is the > following: > {code:sql} > ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14484) Extensions for initial materialized views implementation
[ https://issues.apache.org/jira/browse/HIVE-14484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14484: --- Issue Type: Improvement (was: Bug) > Extensions for initial materialized views implementation > > > Key: HIVE-14484 > URL: https://issues.apache.org/jira/browse/HIVE-14484 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > > Follow-up of HIVE-14249. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"
[ https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503061#comment-15503061 ] Rui Li commented on HIVE-14714: --- bq. These threads are running in HS2 therefore, they won't be terminated in case of beeline is closed. Yeah, but if we use CLI, these threads run in the CLI. Then we may lose some output from spark-submit after CLI exits. Thinking more about this, I think the problem is more specific to yarn-cluster mode right? Because in yarn-client mode, RemoteDriver runs in spark-submit so it should shut down properly. For yarn-cluster mode, spark-submit is just a monitor for the spark app. It may be acceptable to lose some output from it. But on the other hand, user can set {{spark.yarn.submit.waitAppCompletion=false}} so that spark-submit exits after the app is submitted in order to avoid this hanging issue. HIVE-13895 actually made this default. I wonder if that should be enough for the issue. > Finishing Hive on Spark causes "java.io.IOException: Stream closed" > --- > > Key: HIVE-14714 > URL: https://issues.apache.org/jira/browse/HIVE-14714 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.1.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky > Attachments: HIVE-14714.2.patch, HIVE-14714.patch > > > After execute hive command with Spark, finishing the beeline session or > even switch the engine causes IOException. The following executed Ctrl-D to > finish the session but "!quit" or even "set hive.execution.engine=mr;" causes > the issue. > From HS2 log: > {code} > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote > driver, interrupting... > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [Driver]: Waiting thread interrupted, killing child process. > 2016-09-06 16:15:12,296 WARN org.apache.hive.spark.client.SparkClientImpl: > [stderr-redir-1]: Error in redirector thread. > java.io.IOException: Stream closed > at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:154) > at java.io.BufferedReader.readLine(BufferedReader.java:317) > at java.io.BufferedReader.readLine(BufferedReader.java:382) > at > org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"
[ https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502706#comment-15502706 ] Gabor Szadovszky commented on HIVE-14714: - Hi [~lirui], # The root cause of the spark submit hang is that the refresh interval of the checking of the process might set as large as it won't get the new state of the remote driver in time. This value can be modified by the user therefore, I would like to handle this situation. # These threads are running in HS2 therefore, they won't be terminated in case of beeline is closed. The only effect on the beeline is that it don't have to wait for the timeout as the method stop() will return immediately. (In case of HS2 is running in embedded mode, then these threads will be terminated but it was the original behaviour which I haven't changed.) > Finishing Hive on Spark causes "java.io.IOException: Stream closed" > --- > > Key: HIVE-14714 > URL: https://issues.apache.org/jira/browse/HIVE-14714 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.1.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky > Attachments: HIVE-14714.2.patch, HIVE-14714.patch > > > After execute hive command with Spark, finishing the beeline session or > even switch the engine causes IOException. The following executed Ctrl-D to > finish the session but "!quit" or even "set hive.execution.engine=mr;" causes > the issue. > From HS2 log: > {code} > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote > driver, interrupting... > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [Driver]: Waiting thread interrupted, killing child process. > 2016-09-06 16:15:12,296 WARN org.apache.hive.spark.client.SparkClientImpl: > [stderr-redir-1]: Error in redirector thread. > java.io.IOException: Stream closed > at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:154) > at java.io.BufferedReader.readLine(BufferedReader.java:317) > at java.io.BufferedReader.readLine(BufferedReader.java:382) > at > org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14777) Add support of Spark-2.0.0 in Hive-2.X.X
[ https://issues.apache.org/jira/browse/HIVE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li resolved HIVE-14777. --- Resolution: Duplicate Fix Version/s: (was: 2.2.0) > Add support of Spark-2.0.0 in Hive-2.X.X > > > Key: HIVE-14777 > URL: https://issues.apache.org/jira/browse/HIVE-14777 > Project: Hive > Issue Type: Wish >Reporter: Oleksiy Sayankin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14777) Add support of Spark-2.0.0 in Hive-2.X.X
[ https://issues.apache.org/jira/browse/HIVE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502667#comment-15502667 ] Rui Li commented on HIVE-14777: --- Closing this one as a dup of HIVE-14029. > Add support of Spark-2.0.0 in Hive-2.X.X > > > Key: HIVE-14777 > URL: https://issues.apache.org/jira/browse/HIVE-14777 > Project: Hive > Issue Type: Wish >Reporter: Oleksiy Sayankin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14785) return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
[ https://issues.apache.org/jira/browse/HIVE-14785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502557#comment-15502557 ] vinitkumar commented on HIVE-14785: --- This is the error its showing. "Log Type: stderr Log UpLoadTime: 19-Sep-2016 07:01:21 Log Length: 88 Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster Log Type: stdout Log UpLoadTime: 19-Sep-2016 07:01:21 Log Length: 0" > return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask > --- > > Key: HIVE-14785 > URL: https://issues.apache.org/jira/browse/HIVE-14785 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.1 > Environment: Hortonworks, Talend , Hive >Reporter: vinitkumar > > Hi, > I am creating partitioned ORS table in Hive using Talend. But after executing > job i am getting error :- > Error while processing statement: FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.mr.MapRedTask > Can you please suggest what could be the issue ? > Thanks, > Vinitkumar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14785) return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
[ https://issues.apache.org/jira/browse/HIVE-14785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502556#comment-15502556 ] vinitkumar commented on HIVE-14785: --- This is the error its showing. "Log Type: stderr Log UpLoadTime: 19-Sep-2016 07:01:21 Log Length: 88 Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster Log Type: stdout Log UpLoadTime: 19-Sep-2016 07:01:21 Log Length: 0" > return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask > --- > > Key: HIVE-14785 > URL: https://issues.apache.org/jira/browse/HIVE-14785 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.1 > Environment: Hortonworks, Talend , Hive >Reporter: vinitkumar > > Hi, > I am creating partitioned ORS table in Hive using Talend. But after executing > job i am getting error :- > Error while processing statement: FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.mr.MapRedTask > Can you please suggest what could be the issue ? > Thanks, > Vinitkumar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14785) return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
[ https://issues.apache.org/jira/browse/HIVE-14785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502553#comment-15502553 ] vinitkumar commented on HIVE-14785: --- I opened log file. Its giving error like "Log Type: stderr Log UpLoadTime: 19-Sep-2016 07:01:21 Log Length: 88 Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster Log Type: stdout Log UpLoadTime: 19-Sep-2016 07:01:21 Log Length: 0 " > return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask > --- > > Key: HIVE-14785 > URL: https://issues.apache.org/jira/browse/HIVE-14785 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.1 > Environment: Hortonworks, Talend , Hive >Reporter: vinitkumar > > Hi, > I am creating partitioned ORS table in Hive using Talend. But after executing > job i am getting error :- > Error while processing statement: FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.mr.MapRedTask > Can you please suggest what could be the issue ? > Thanks, > Vinitkumar -- This message was sent by Atlassian JIRA (v6.3.4#6332)