date:20160919

[jira] [Updated] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on

2016-09-19 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14783:

Status: Patch Available  (was: Reopened)

> bucketing column should be part of sorting for delete/update operation when 
> spdo is on
> --
>
> Key: HIVE-14783
> URL: https://issues.apache.org/jira/browse/HIVE-14783
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Transactions
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-14783.1.patch, HIVE-14783.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on

2016-09-19 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reopened HIVE-14783:
-

Missed updating Select operator in Reducer.

> bucketing column should be part of sorting for delete/update operation when 
> spdo is on
> --
>
> Key: HIVE-14783
> URL: https://issues.apache.org/jira/browse/HIVE-14783
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Transactions
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-14783.1.patch, HIVE-14783.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on

2016-09-19 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14783:

Attachment: HIVE-14783.1.patch

> bucketing column should be part of sorting for delete/update operation when 
> spdo is on
> --
>
> Key: HIVE-14783
> URL: https://issues.apache.org/jira/browse/HIVE-14783
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Transactions
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-14783.1.patch, HIVE-14783.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505429#comment-15505429
 ] 

Sergio Peña edited comment on HIVE-14793 at 9/20/16 3:13 AM:
-

Thanks [~sseth]. I couple of comments

1. Can we create a new function that checks and/or initializes environment 
variables? 
I think this would be useful for new devs when looking at what config variables 
can be used.

2. --outputDir is not necessary anymore. 
I fixed the test results issue in HIVE-14790, and also in the jenkins job 
config.




was (Author: spena):
Thanks [~sseth]. I couple of comments



> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505429#comment-15505429
 ] 

Sergio Peña commented on HIVE-14793:


Thanks [~sseth]. I couple of comments



> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14624) LLAP: Use FQDN when submitting work to LLAP

2016-09-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505376#comment-15505376
 ] 

Hive QA commented on HIVE-14624:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829283/HIVE-14624.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1235/testReport
Console output: 
https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1235/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/jenkins-PreCommit-HIVE-Build-1235/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829283 - jenkins-PreCommit-HIVE-Build

> LLAP: Use FQDN when submitting work to LLAP 
> 
>
> Key: HIVE-14624
> URL: https://issues.apache.org/jira/browse/HIVE-14624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-14624.01.patch, HIVE-14624.02.patch, 
> HIVE-14624.03.patch, HIVE-14624.patch
>
>
> {code}
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> + socketAddress.getHostName());
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> host = socketAddress.getHostName();
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:  
> public static String getHostName() {
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:   
>return InetAddress.getLocalHost().getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> String name = address.getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> builder.setAmHost(address.getHostName());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: 
>nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), 
> localAddress.get().getPort());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
> localAddress.get().getHostName(), vertex.getDagName(), 
> qIdProto.getDagIndex(),
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
>   new ExecutionContextImpl(localAddress.get().getHostName()), env,
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: 
>String hostName = MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java:
> .setBindAddress(addr.getHostName())
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java:
>   request.getContainerIdString(), executionContext.getHostName(), 
> vertex.getDagName(),
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>String displayName = "LlapDaemonCacheMetrics-" + 
> MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName();
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java:
>   new LlapProtocolClientImpl(new Configuration(), 
> serverAddr.getHostName(),
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java:
> builder.setAmHost(getAddress().getHostName());
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:
>   String displayName = "LlapTaskSchedulerMetrics-" + 
> MetricsUtils.getHostName();
> {code}
> In systems where the hostnames do not match FQDN, calling the 
> getCanonicalHostName() will allow for resolution of the hostname when 
>

[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"

2016-09-19 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505363#comment-15505363
 ] 

Rui Li commented on HIVE-14714:
---

Hi [~gszadovszky], in that case, how about logging some brief message in DEBUG 
level? I'm just wary to swallow any exceptions.
[~xuefuz], do you have any thoughts on this?

> Finishing Hive on Spark causes "java.io.IOException: Stream closed"
> ---
>
> Key: HIVE-14714
> URL: https://issues.apache.org/jira/browse/HIVE-14714
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
> Attachments: HIVE-14714.2.patch, HIVE-14714.patch
>
>
> After execute hive command with Spark, finishing the beeline session or
> even switch the engine causes IOException. The following executed Ctrl-D to
> finish the session but "!quit" or even "set hive.execution.engine=mr;" causes
> the issue.
> From HS2 log:
> {code}
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote 
> driver, interrupting...
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [Driver]: Waiting thread interrupted, killing child process.
> 2016-09-06 16:15:12,296 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [stderr-redir-1]: Error in redirector thread.
> java.io.IOException: Stream closed
> at 
> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:154)
> at java.io.BufferedReader.readLine(BufferedReader.java:317)
> at java.io.BufferedReader.readLine(BufferedReader.java:382)
> at 
> org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2016-09-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505237#comment-15505237
 ] 

Hive QA commented on HIVE-14792:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829295/HIVE-14792.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1234/testReport
Console output: 
https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1234/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/jenkins-PreCommit-HIVE-Build-1234/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829295 - jenkins-PreCommit-HIVE-Build

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14792.1.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14779) make DbTxnManager.HeartbeaterThread a daemon

2016-09-19 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14779:
--
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Thanks Alan for the review

> make DbTxnManager.HeartbeaterThread a daemon
> 
>
> Key: HIVE-14779
> URL: https://issues.apache.org/jira/browse/HIVE-14779
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14779.patch
>
>
> setDaemon(true);
> make heartbeaterThreadPoolSize static 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2016-09-19 Thread Shannon Ladymon (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505144#comment-15505144
 ] 

Shannon Ladymon commented on HIVE-7926:
---

[~sershe], thanks for clarifying that!

I had a few other things I'd like to reword but am not quite sure how. I'd 
appreciate any light you can shed on them:
* In the sentence: “The initial stage of the query is pushed into #LLAP, large 
shuffle is performed in their own containers” - What does "their own 
containers" refer to? Is there only one large shuffle, or multiple shuffles?
* In the sentence: "The node allows parallel execution for multiple query 
fragments from different queries and sessions” - what does "the node" refer to? 
A single LLAP node? 

[~asears], I noticed that the last link on the page, titled ["Try Hive LLAP" | 
http://www.lewuathe.com/blog/2015/08/12/try-hive-llap/] is a broken link. 
Should I delete that link from the page, or is there an updated link you'd like 
to add? Also, the Web Services section is currently blank. Should this section 
be deleted, or is there content you intend to add?

> long-lived daemons for query fragment execution, I/O and caching
> 
>
> Key: HIVE-7926
> URL: https://issues.apache.org/jira/browse/HIVE-7926
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: LLAPdesigndocument.pdf
>
>
> We are proposing a new execution model for Hive that is a combination of 
> existing process-based tasks and long-lived daemons running on worker nodes. 
> These nodes can take care of efficient I/O, caching and query fragment 
> execution, while heavy lifting like most joins, ordering, etc. can be handled 
> by tasks.
> The proposed model is not a 2-system solution for small and large queries; 
> neither it is a separate execution engine like MR or Tez. It can be used by 
> any Hive execution engine, if support is added; in future even external 
> products (e.g. Pig) can use it.
> The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14779) make DbTxnManager.HeartbeaterThread a daemon

2016-09-19 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505075#comment-15505075
 ] 

Alan Gates commented on HIVE-14779:
---

+1, see comments above.

> make DbTxnManager.HeartbeaterThread a daemon
> 
>
> Key: HIVE-14779
> URL: https://issues.apache.org/jira/browse/HIVE-14779
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-14779.patch
>
>
> setDaemon(true);
> make heartbeaterThreadPoolSize static 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14779) make DbTxnManager.HeartbeaterThread a daemon

2016-09-19 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505073#comment-15505073
 ] 

Alan Gates commented on HIVE-14779:
---

Ok, so I agree this doesn't make the situation any worse, and in fact makes it 
better since it will avoid the situation where everything dies but the 
heartbeat thread and that thread keeps the VM alive and keeps heartbeating.  At 
some point in the future I do think we should solve the issue where a hanging 
CLI client could hang the system.

> make DbTxnManager.HeartbeaterThread a daemon
> 
>
> Key: HIVE-14779
> URL: https://issues.apache.org/jira/browse/HIVE-14779
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-14779.patch
>
>
> setDaemon(true);
> make heartbeaterThreadPoolSize static 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14792:

Status: Patch Available  (was: Open)

Submitting, to run tests.

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14792.1.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14794) HCatalog support to pre-fetch schema for Avro tables that use avro.schema.url.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14794:

Attachment: HIVE-14794.1.patch

This patch builds on HIVE-14792. It uses {{SpecialCases}} to prefetch Avro 
schema.

> HCatalog support to pre-fetch schema for Avro tables that use avro.schema.url.
> --
>
> Key: HIVE-14794
> URL: https://issues.apache.org/jira/browse/HIVE-14794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14794.1.patch
>
>
> HIVE-14792 introduces support to modify and add properties to 
> table-parameters during query-planning. It prefetches remote Avro-schema 
> information and stores it in TBLPROPERTIES, under {{avro.schema.literal}}.
> We'll need similar support in {{HCatLoader}} to prevent excessive reads of 
> schema-files in Pig queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14794) HCatalog support to pre-fetch schema for Avro tables that use avro.schema.url.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14794:

Summary: HCatalog support to pre-fetch schema for Avro tables that use 
avro.schema.url.  (was: HCatalog support to pre-fetch for Avro tables that use 
avro.schema.url.)

> HCatalog support to pre-fetch schema for Avro tables that use avro.schema.url.
> --
>
> Key: HIVE-14794
> URL: https://issues.apache.org/jira/browse/HIVE-14794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> HIVE-14792 introduces support to modify and add properties to 
> table-parameters during query-planning. It prefetches remote Avro-schema 
> information and stores it in TBLPROPERTIES, under {{avro.schema.literal}}.
> We'll need similar support in {{HCatLoader}} to prevent excessive reads of 
> schema-files in Pig queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon

2016-09-19 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14790:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

I committed this to master without review so that we start seeing the tests 
results on Jenkins.

> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon
> ---
>
> Key: HIVE-14790
> URL: https://issues.apache.org/jira/browse/HIVE-14790
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 2.2.0
>
> Attachments: HIVE-14790.1.patch
>
>
> NO PRECOMMIT TESTS
> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14341) Altered skewed location is not respected for list bucketing

2016-09-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505009#comment-15505009
 ] 

Hive QA commented on HIVE-14341:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829272/HIVE-14341.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10555 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_skewed_table]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_skewed_table1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_list_bucket]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_5]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_6]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_7]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_8]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[list_bucket_dml_2]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1233/testReport
Console output: 
https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1233/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/jenkins-PreCommit-HIVE-Build-1233/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829272 - jenkins-PreCommit-HIVE-Build

> Altered skewed location is not respected for list bucketing
> ---
>
> Key: HIVE-14341
> URL: https://issues.apache.org/jira/browse/HIVE-14341
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14341.1.patch, HIVE-14341.2.patch
>
>
> CREATE TABLE list_bucket_single (key STRING, value STRING)
>   SKEWED BY (key) ON (1,5,6) STORED AS DIRECTORIES;
> alter table list_bucket_single set skewed location 
> (''1"="/user/hive/warehouse/hdfs_skewed/new1");
> While when you insert a row to key 1, the location falls back to the default 
> one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14792:

Attachment: HIVE-14792.1.patch

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14792.1.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14792:

Attachment: (was: HIVE-14792.1.patch)

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-19 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned HIVE-14793:
-

Assignee: Siddharth Seth

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-19 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14793:
--
Attachment: HIVE-14793.01.patch

cc [~spena], [~prasanth_j] for review.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
> Attachments: HIVE-14793.01.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589

2016-09-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14680:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> retain consistent splits /during/ (as opposed to across) LLAP failures on top 
> of HIVE-14589
> ---
>
> Key: HIVE-14680
> URL: https://issues.apache.org/jira/browse/HIVE-14680
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, 
> HIVE-14680.03.patch, HIVE-14680.patch
>
>
> see HIVE-14589.
> Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) 
> is to return locations for all slots to HostAffinitySplitLocationProvider, 
> the missing slots being inactive locations (based solely on the last slot 
> actually present). For the splits mapped to these locations, fall back via 
> different hash functions, or some sort of probing.
> This still doesn't handle all the cases, namely when the last slots are gone 
> (consistent hashing is supposed to be good for this?); however for that we'd 
> need more involved coordination between nodes or a central updater to 
> indicate the number of nodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh

2016-09-19 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504975#comment-15504975
 ] 

Siddharth Seth commented on HIVE-14734:
---

Sure. If you think that's what it is. I'm not sure the outputDir is set 
correctly either.

> Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
> -
>
> Key: HIVE-14734
> URL: https://issues.apache.org/jira/browse/HIVE-14734
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 2.2.0
>
> Attachments: HIVE-14734.2.patch, HIVE-14734.patch
>
>
> NO PRECOMMIT TESTS
> Currently, to execute tests on a new branch, a manual process must be done:
> 1. Create a new Jenkins job with the new branch name
> 2. Create a patch to jenkins-submit-build.sh with the new branch
> 3. Create a profile properties file on the ptest master with the new branch
> This jira will attempt to automate steps 1 and 2 by detecting the branch 
> profile from a patch to test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14792:

Description: 
Avro tables that use "external" schema files stored on HDFS can cause excessive 
calls to {{FileSystem::open()}}, especially for queries that spawn large 
numbers of mappers.

This is because of the following code in {{AvroSerDe::initialize()}}:

{code:title=AvroSerDe.java|borderStyle=solid}
public void initialize(Configuration configuration, Properties properties) 
throws SerDeException {
// ...
if (hasExternalSchema(properties)
|| columnNameProperty == null || columnNameProperty.isEmpty()
|| columnTypeProperty == null || columnTypeProperty.isEmpty()) {
  schema = determineSchemaOrReturnErrorSchema(configuration, properties);
} else {
  // Get column names and sort order
  columnNames = Arrays.asList(columnNameProperty.split(","));
  columnTypes = 
TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);

  schema = getSchemaFromCols(properties, columnNames, columnTypes, 
columnCommentProperty);
 
properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
 schema.toString());
}
// ...
}
{code}

For tables using {{avro.schema.url}}, every time the SerDe is initialized (i.e. 
at least once per mapper), the schema file is read remotely. For queries with 
thousands of mappers, this leads to a stampede to the handful (3?) datanodes 
that host the schema-file. In the best case, this causes slowdowns.

It would be preferable to distribute the Avro-schema to all mappers as part of 
the job-conf. The alternatives aren't exactly appealing:
# One can't rely solely on the {{column.list.types}} stored in the Hive 
metastore. (HIVE-14789).
# {{avro.schema.literal}} might not always be usable, because of the size-limit 
on table-parameters. The typical size of the Avro-schema file is between 
0.5-3MB, in my limited experience. Bumping the max table-parameter size isn't a 
great solution.

If the {{avro.schema.file}} were read during query-planning, and made available 
as part of table-properties (but not serialized into the metastore), the 
downstream logic will remain largely intact. I have a patch that does this.



  was:
Avro tables that use "external" schema files stored on HDFS can cause excessive 
calls to {{FileSystem::open()}}, especially for queries that spawn large 
numbers of mappers.

This is because of the following code in {{AvroSerDe::initialize()}}:

{code:title=AvroSerDe.java|borderStyle=solid}
public void initialize(Configuration configuration, Properties properties) 
throws SerDeException {
// ...
if (hasExternalSchema(properties)
|| columnNameProperty == null || columnNameProperty.isEmpty()
|| columnTypeProperty == null || columnTypeProperty.isEmpty()) {
  schema = determineSchemaOrReturnErrorSchema(configuration, properties);
} else {
  // Get column names and sort order
  columnNames = Arrays.asList(columnNameProperty.split(","));
  columnTypes = 
TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);

  schema = getSchemaFromCols(properties, columnNames, columnTypes, 
columnCommentProperty);
 
properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
 schema.toString());
}
// ...
}
{code}

For files using {{avro.schema.url}}, every time the SerDe is initialized (i.e. 
at least once per mapper), the schema file is read remotely. For queries with 
thousands of mappers, this leads to a stampede to the handful (3?) datanodes 
that host the schema-file. In the best case, this causes slowdowns.

It would be preferable to distribute the Avro-schema to all mappers as part of 
the job-conf. The alternatives aren't exactly appealing:
# One can't rely solely on the {{column.list.types}} stored in the Hive 
metastore. (HIVE-14789).
# {{avro.schema.literal}} might not always be usable, because of the size-limit 
on table-parameters. The typical size of the Avro-schema file is between 
0.5-3MB, in my limited experience. Bumping the max table-parameter size isn't a 
great solution.

If the {{avro.schema.file}} were read during query-planning, and made available 
as part of table-properties (but not serialized into the metastore), the 
downstream logic will remain largely intact. I have a patch that does this.




> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments:

[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2016-09-19 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504961#comment-15504961
 ] 

Sergey Shelukhin commented on HIVE-7926:


That part was not actually done by me, I think it might be a twitter-like tag 
prefix ;) It can be ignored/dropped.

> long-lived daemons for query fragment execution, I/O and caching
> 
>
> Key: HIVE-7926
> URL: https://issues.apache.org/jira/browse/HIVE-7926
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: LLAPdesigndocument.pdf
>
>
> We are proposing a new execution model for Hive that is a combination of 
> existing process-based tasks and long-lived daemons running on worker nodes. 
> These nodes can take care of efficient I/O, caching and query fragment 
> execution, while heavy lifting like most joins, ordering, etc. can be handled 
> by tasks.
> The proposed model is not a 2-system solution for small and large queries; 
> neither it is a separate execution engine like MR or Tez. It can be used by 
> any Hive execution engine, if support is added; in future even external 
> products (e.g. Pig) can use it.
> The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14792:

Attachment: HIVE-14792.1.patch

This patch introduces an optimizer that prefetches the {{avro.schema.url}} 
contents, and modifies the table-info stored in the query-plan to contain the 
schema (as the {{avro.schema.literal}} property). The {{AvroSerDe}} is almost 
completely unchanged, and handles {{avro.schema.literal}} transparently.

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14792.1.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For files using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2016-09-19 Thread Shannon Ladymon (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504935#comment-15504935
 ] 

Shannon Ladymon commented on HIVE-7926:
---

[~sershe], can you clarify why the # is used in #LLAP in the [design doc | 
https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf]?
 

> long-lived daemons for query fragment execution, I/O and caching
> 
>
> Key: HIVE-7926
> URL: https://issues.apache.org/jira/browse/HIVE-7926
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: LLAPdesigndocument.pdf
>
>
> We are proposing a new execution model for Hive that is a combination of 
> existing process-based tasks and long-lived daemons running on worker nodes. 
> These nodes can take care of efficient I/O, caching and query fragment 
> execution, while heavy lifting like most joins, ordering, etc. can be handled 
> by tasks.
> The proposed model is not a 2-system solution for small and large queries; 
> neither it is a separate execution engine like MR or Tez. It can be used by 
> any Hive execution engine, if support is added; in future even external 
> products (e.g. Pig) can use it.
> The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14700) clean up file/txn information via a metastore thread

2016-09-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14700:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to the feature branch.

> clean up file/txn information via a metastore thread
> 
>
> Key: HIVE-14700
> URL: https://issues.apache.org/jira/browse/HIVE-14700
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14700.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14700) clean up file/txn information via a metastore thread

2016-09-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14700:

Summary: clean up file/txn information via a metastore thread  (was: clean 
up file/txn information via a metastore thread similar to compactor)

> clean up file/txn information via a metastore thread
> 
>
> Key: HIVE-14700
> URL: https://issues.apache.org/jira/browse/HIVE-14700
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14700.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14791) LLAP: Use FQDN when submitting work to LLAP

2016-09-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14791:

Summary: LLAP: Use FQDN when submitting work to LLAP   (was: LLAP: Use FQDN 
for all communication )

> LLAP: Use FQDN when submitting work to LLAP 
> 
>
> Key: HIVE-14791
> URL: https://issues.apache.org/jira/browse/HIVE-14791
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
>
> {code}
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> + socketAddress.getHostName());
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> host = socketAddress.getHostName();
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:  
> public static String getHostName() {
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:   
>return InetAddress.getLocalHost().getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> String name = address.getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> builder.setAmHost(address.getHostName());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: 
>nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), 
> localAddress.get().getPort());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
> localAddress.get().getHostName(), vertex.getDagName(), 
> qIdProto.getDagIndex(),
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
>   new ExecutionContextImpl(localAddress.get().getHostName()), env,
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: 
>String hostName = MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java:
> .setBindAddress(addr.getHostName())
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java:
>   request.getContainerIdString(), executionContext.getHostName(), 
> vertex.getDagName(),
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>String displayName = "LlapDaemonCacheMetrics-" + 
> MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName();
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java:
>   new LlapProtocolClientImpl(new Configuration(), 
> serverAddr.getHostName(),
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java:
> builder.setAmHost(getAddress().getHostName());
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:
>   String displayName = "LlapTaskSchedulerMetrics-" + 
> MetricsUtils.getHostName();
> {code}
> In systems where the hostnames do not match FQDN, calling the 
> getCanonicalHostName() will allow for resolution of the hostname when 
> accessing from a different base domain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14624) LLAP: Use FQDN when submitting work to LLAP

2016-09-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14624:

Summary: LLAP: Use FQDN when submitting work to LLAP   (was: LLAP: Use FQDN 
for all communication )

> LLAP: Use FQDN when submitting work to LLAP 
> 
>
> Key: HIVE-14624
> URL: https://issues.apache.org/jira/browse/HIVE-14624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-14624.01.patch, HIVE-14624.02.patch, 
> HIVE-14624.03.patch, HIVE-14624.patch
>
>
> {code}
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> + socketAddress.getHostName());
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> host = socketAddress.getHostName();
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:  
> public static String getHostName() {
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:   
>return InetAddress.getLocalHost().getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> String name = address.getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> builder.setAmHost(address.getHostName());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: 
>nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), 
> localAddress.get().getPort());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
> localAddress.get().getHostName(), vertex.getDagName(), 
> qIdProto.getDagIndex(),
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
>   new ExecutionContextImpl(localAddress.get().getHostName()), env,
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: 
>String hostName = MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java:
> .setBindAddress(addr.getHostName())
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java:
>   request.getContainerIdString(), executionContext.getHostName(), 
> vertex.getDagName(),
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>String displayName = "LlapDaemonCacheMetrics-" + 
> MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName();
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java:
>   new LlapProtocolClientImpl(new Configuration(), 
> serverAddr.getHostName(),
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java:
> builder.setAmHost(getAddress().getHostName());
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:
>   String displayName = "LlapTaskSchedulerMetrics-" + 
> MetricsUtils.getHostName();
> {code}
> In systems where the hostnames do not match FQDN, calling the 
> getCanonicalHostName() will allow for resolution of the hostname when 
> accessing from a different base domain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14791) LLAP: Use FQDN for all communication

2016-09-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14791:

Summary: LLAP: Use FQDN for all communication  (was: LLAP: Use FQDN when 
submitting work to LLAP )

> LLAP: Use FQDN for all communication
> 
>
> Key: HIVE-14791
> URL: https://issues.apache.org/jira/browse/HIVE-14791
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
>
> {code}
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> + socketAddress.getHostName());
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> host = socketAddress.getHostName();
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:  
> public static String getHostName() {
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:   
>return InetAddress.getLocalHost().getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> String name = address.getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> builder.setAmHost(address.getHostName());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: 
>nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), 
> localAddress.get().getPort());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
> localAddress.get().getHostName(), vertex.getDagName(), 
> qIdProto.getDagIndex(),
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
>   new ExecutionContextImpl(localAddress.get().getHostName()), env,
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: 
>String hostName = MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java:
> .setBindAddress(addr.getHostName())
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java:
>   request.getContainerIdString(), executionContext.getHostName(), 
> vertex.getDagName(),
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>String displayName = "LlapDaemonCacheMetrics-" + 
> MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName();
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java:
>   new LlapProtocolClientImpl(new Configuration(), 
> serverAddr.getHostName(),
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java:
> builder.setAmHost(getAddress().getHostName());
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:
>   String displayName = "LlapTaskSchedulerMetrics-" + 
> MetricsUtils.getHostName();
> {code}
> In systems where the hostnames do not match FQDN, calling the 
> getCanonicalHostName() will allow for resolution of the hostname when 
> accessing from a different base domain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14624) LLAP: Use FQDN for all communication

2016-09-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14624:

Attachment: HIVE-14624.03.patch

unit test failure is a test-specific issue with a mock. Updated.
[~sseth] will clone and rename this one since all the discussion has been here

> LLAP: Use FQDN for all communication 
> -
>
> Key: HIVE-14624
> URL: https://issues.apache.org/jira/browse/HIVE-14624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-14624.01.patch, HIVE-14624.02.patch, 
> HIVE-14624.03.patch, HIVE-14624.patch
>
>
> {code}
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> + socketAddress.getHostName());
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> host = socketAddress.getHostName();
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:  
> public static String getHostName() {
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:   
>return InetAddress.getLocalHost().getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> String name = address.getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> builder.setAmHost(address.getHostName());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: 
>nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), 
> localAddress.get().getPort());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
> localAddress.get().getHostName(), vertex.getDagName(), 
> qIdProto.getDagIndex(),
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
>   new ExecutionContextImpl(localAddress.get().getHostName()), env,
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: 
>String hostName = MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java:
> .setBindAddress(addr.getHostName())
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java:
>   request.getContainerIdString(), executionContext.getHostName(), 
> vertex.getDagName(),
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>String displayName = "LlapDaemonCacheMetrics-" + 
> MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName();
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java:
>   new LlapProtocolClientImpl(new Configuration(), 
> serverAddr.getHostName(),
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java:
> builder.setAmHost(getAddress().getHostName());
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:
>   String displayName = "LlapTaskSchedulerMetrics-" + 
> MetricsUtils.getHostName();
> {code}
> In systems where the hostnames do not match FQDN, calling the 
> getCanonicalHostName() will allow for resolution of the hostname when 
> accessing from a different base domain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14700) clean up file/txn information via a metastore thread similar to compactor

2016-09-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14700:

Attachment: (was: HIVE-14700.WIP.patch)

> clean up file/txn information via a metastore thread similar to compactor
> -
>
> Key: HIVE-14700
> URL: https://issues.apache.org/jira/browse/HIVE-14700
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14700.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP

2016-09-19 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504870#comment-15504870
 ] 

Sergey Shelukhin commented on HIVE-14651:
-

lgtm +1

> Add a local cluster for Tez and LLAP
> 
>
> Key: HIVE-14651
> URL: https://issues.apache.org/jira/browse/HIVE-14651
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, 
> HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon

2016-09-19 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14790:
---
Description: 
NO PRECOMMIT TESTS

Jenkins is not displaying test results because 'set -e' is aborting the script 
too soon

  was:Jenkins is not displaying test results because 'set -e' is aborting the 
script too soon


> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon
> ---
>
> Key: HIVE-14790
> URL: https://issues.apache.org/jira/browse/HIVE-14790
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14790.1.patch
>
>
> NO PRECOMMIT TESTS
> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon

2016-09-19 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14790:
---
Status: Patch Available  (was: Open)

> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon
> ---
>
> Key: HIVE-14790
> URL: https://issues.apache.org/jira/browse/HIVE-14790
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14790.1.patch
>
>
> NO PRECOMMIT TESTS
> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon

2016-09-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504848#comment-15504848
 ] 

Sergio Peña commented on HIVE-14790:


[~sseth] This is the patch.

> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon
> ---
>
> Key: HIVE-14790
> URL: https://issues.apache.org/jira/browse/HIVE-14790
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14790.1.patch
>
>
> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon

2016-09-19 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña reassigned HIVE-14790:
--

Assignee: Sergio Peña

> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon
> ---
>
> Key: HIVE-14790
> URL: https://issues.apache.org/jira/browse/HIVE-14790
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14790.1.patch
>
>
> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon

2016-09-19 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14790:
---
Attachment: HIVE-14790.1.patch

> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon
> ---
>
> Key: HIVE-14790
> URL: https://issues.apache.org/jira/browse/HIVE-14790
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
> Attachments: HIVE-14790.1.patch
>
>
> Jenkins is not displaying test results because 'set -e' is aborting the 
> script too soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh

2016-09-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504831#comment-15504831
 ] 

Sergio Peña commented on HIVE-14734:


[~sseth] I know what it is. 

This is the last part of the script.
{noformat}
call_ptest_server --testHandle "$TEST_HANDLE" --endpoint "$PTEST_API_ENDPOINT" 
--logsEndpoint "$PTEST_LOG_ENDPOINT" \
--profile "$BUILD_PROFILE" ${optionalArgs[@]} "$@"

ret=$?

unpack_test_results

exit $ret
{noformat}

The {{set -e}} at the beginning of the file is aborting the script when 
{{call_ptest_server}} returns a non-zero value, and he script does not unpack 
the test results.
I'll create a quick patch to remove this.

> Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
> -
>
> Key: HIVE-14734
> URL: https://issues.apache.org/jira/browse/HIVE-14734
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 2.2.0
>
> Attachments: HIVE-14734.2.patch, HIVE-14734.patch
>
>
> NO PRECOMMIT TESTS
> Currently, to execute tests on a new branch, a manual process must be done:
> 1. Create a new Jenkins job with the new branch name
> 2. Create a patch to jenkins-submit-build.sh with the new branch
> 3. Create a profile properties file on the ptest master with the new branch
> This jira will attempt to automate steps 1 and 2 by detecting the branch 
> profile from a patch to test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh

2016-09-19 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504614#comment-15504614
 ] 

Siddharth Seth edited comment on HIVE-14734 at 9/19/16 9:39 PM:


[~spena] - test results are no longer available on Jenkins runs. Investigating, 
but I suspect it may be because of this jira.


was (Author: sseth):
[~spena] - test results are no longer available on Hadoop. Investigating, but I 
suspect it may be because of this jira.

> Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
> -
>
> Key: HIVE-14734
> URL: https://issues.apache.org/jira/browse/HIVE-14734
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 2.2.0
>
> Attachments: HIVE-14734.2.patch, HIVE-14734.patch
>
>
> NO PRECOMMIT TESTS
> Currently, to execute tests on a new branch, a manual process must be done:
> 1. Create a new Jenkins job with the new branch name
> 2. Create a patch to jenkins-submit-build.sh with the new branch
> 3. Create a profile properties file on the ptest master with the new branch
> This jira will attempt to automate steps 1 and 2 by detecting the branch 
> profile from a patch to test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14341) Altered skewed location is not respected for list bucketing

2016-09-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14341:

Attachment: HIVE-14341.2.patch

Patch-2: made the changes so the desc command will show skewed location for 
those locations not updated explicitly. 

With this patch, we will not automatically collect the skew mapping from the 
directory since that would cause the issue if the location is updated 
explicitly.

Rather, given a query like select * from list_bucket_single where key=1, if the 
skew location for key 1 is updated explicitly, then we will have the new 
location from HMS, otherwise, we will check the default location 
/list_bucket_single/key=1. 

> Altered skewed location is not respected for list bucketing
> ---
>
> Key: HIVE-14341
> URL: https://issues.apache.org/jira/browse/HIVE-14341
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14341.1.patch, HIVE-14341.2.patch
>
>
> CREATE TABLE list_bucket_single (key STRING, value STRING)
>   SKEWED BY (key) ON (1,5,6) STORED AS DIRECTORIES;
> alter table list_bucket_single set skewed location 
> (''1"="/user/hive/warehouse/hdfs_skewed/new1");
> While when you insert a row to key 1, the location falls back to the default 
> one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14789) Avro Table-reads bork when using SerDe-generated table-schema.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14789:

Attachment: HIVE-14789-reproduce.patch

This attachment has a qfile-test that reproduces the error I'm talking about, 
including a scrubbed data-file that's readable with the schema-literal, but not 
without it. 

This was a fairly common failure at Yahoo. Our current recommendation is for 
users to only use Avro tables with the schema-file with which they were 
produced. The metastore-based schema is to be ignored entirely.

I've already tried modifying how the Avro schema is generated from 
{{columns.list.types}}, but I find that the conversions (to and fro) are lossy, 
brittle and unreliable. :/

> Avro Table-reads bork when using SerDe-generated table-schema.
> --
>
> Key: HIVE-14789
> URL: https://issues.apache.org/jira/browse/HIVE-14789
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.1, 2.0.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14789-reproduce.patch
>
>
> AvroSerDe allows one to skip the table-columns in a table-definition when 
> creating a table, as long as the TBLPROPERTIES includes a valid 
> {{avro.schema.url}} or {{avro.schema.literal}}. The table-columns are 
> inferred from processing the Avro schema file/literal.
> The problem is that the inferred schema might not be congruent with the 
> actual schema in the Avro schema file/literal. Consider the following table 
> definition:
> {code:sql}
> CREATE TABLE avro_schema_break_1
> ROW FORMAT
> SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> TBLPROPERTIES ('avro.schema.literal'='{
>   "type": "record",
>   "name": "Messages",
>   "namespace": "net.myth",
>   "fields": [
> {
>   "name": "header",
>   "type": [
> "null",
> {
>   "type": "record",
>   "name": "HeaderInfo",
>   "fields": [
> {
>   "name": "inferred_event_type",
>   "type": [
> "null",
> "string"
>   ],
>   "default": null
> },
> {
>   "name": "event_type",
>   "type": [
> "null",
> "string"
>   ],
>   "default": null
> },
> {
>   "name": "event_version",
>   "type": [
> "null",
> "string"
>   ],
>   "default": null
> }
>   ]
> }
>   ]
> },
> {
>   "name": "messages",
>   "type": {
> "type": "array",
> "items": {
>   "name": "MessageInfo",
>   "type": "record",
>   "fields": [
> {
>   "name": "message_id",
>   "type": [
> "null",
> "string"
>   ],
>   "doc": "Message-ID"
> },
> {
>   "name": "received_date",
>   "type": [
> "null",
> "long"
>   ],
>   "doc": "Received Date"
> },
> {
>   "name": "sent_date",
>   "type": [
> "null",
> "long"
>   ]
> },
> {
>   "name": "from_name",
>   "type": [
> "null",
> "string"
>   ]
> },
> {
>   "name": "flags",
>   "type": [
> "null",
> {
>   "type": "record",
>   "name": "Flags",
>   "fields": [
> {
>   "name": "is_seen",
>   "type": [
> "null",
> "boolean"
>   ],
>   "default": null
> },
> {
>   "name": "is_read",
>   "type": [
> "null",
> "boolean"
>   ],
>   "default": null
> },
> {
>   "name": "is_flagged",
>   "type": [
> "null",
> "boolean"
>   ],
>   "default": null
> }
>

[jira] [Assigned] (HIVE-14789) Avro Table-reads bork when using SerDe-generated table-schema.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-14789:
---

Assignee: Mithun Radhakrishnan

> Avro Table-reads bork when using SerDe-generated table-schema.
> --
>
> Key: HIVE-14789
> URL: https://issues.apache.org/jira/browse/HIVE-14789
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.1, 2.0.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> AvroSerDe allows one to skip the table-columns in a table-definition when 
> creating a table, as long as the TBLPROPERTIES includes a valid 
> {{avro.schema.url}} or {{avro.schema.literal}}. The table-columns are 
> inferred from processing the Avro schema file/literal.
> The problem is that the inferred schema might not be congruent with the 
> actual schema in the Avro schema file/literal. Consider the following table 
> definition:
> {code:sql}
> CREATE TABLE avro_schema_break_1
> ROW FORMAT
> SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> TBLPROPERTIES ('avro.schema.literal'='{
>   "type": "record",
>   "name": "Messages",
>   "namespace": "net.myth",
>   "fields": [
> {
>   "name": "header",
>   "type": [
> "null",
> {
>   "type": "record",
>   "name": "HeaderInfo",
>   "fields": [
> {
>   "name": "inferred_event_type",
>   "type": [
> "null",
> "string"
>   ],
>   "default": null
> },
> {
>   "name": "event_type",
>   "type": [
> "null",
> "string"
>   ],
>   "default": null
> },
> {
>   "name": "event_version",
>   "type": [
> "null",
> "string"
>   ],
>   "default": null
> }
>   ]
> }
>   ]
> },
> {
>   "name": "messages",
>   "type": {
> "type": "array",
> "items": {
>   "name": "MessageInfo",
>   "type": "record",
>   "fields": [
> {
>   "name": "message_id",
>   "type": [
> "null",
> "string"
>   ],
>   "doc": "Message-ID"
> },
> {
>   "name": "received_date",
>   "type": [
> "null",
> "long"
>   ],
>   "doc": "Received Date"
> },
> {
>   "name": "sent_date",
>   "type": [
> "null",
> "long"
>   ]
> },
> {
>   "name": "from_name",
>   "type": [
> "null",
> "string"
>   ]
> },
> {
>   "name": "flags",
>   "type": [
> "null",
> {
>   "type": "record",
>   "name": "Flags",
>   "fields": [
> {
>   "name": "is_seen",
>   "type": [
> "null",
> "boolean"
>   ],
>   "default": null
> },
> {
>   "name": "is_read",
>   "type": [
> "null",
> "boolean"
>   ],
>   "default": null
> },
> {
>   "name": "is_flagged",
>   "type": [
> "null",
> "boolean"
>   ],
>   "default": null
> }
>   ]
> }
>   ],
>   "default": null
> }
>   ]
> }
>   }
> }
>   ]
> }');
> {code}
> This produces a table with the following schema:
> {noformat}
> 2016-09-19T13:23:42,934 DEBUG [0ce7e586-13ea-4390-ac2a-6dac36e8a216 main] 
> hive.log: DDL: struct avro_schema_break_1 { 
> struct 
> header, 
> list>>
>  messages}
> {noformat}
> Data written to

[jira] [Commented] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh

2016-09-19 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504614#comment-15504614
 ] 

Siddharth Seth commented on HIVE-14734:
---

[~spena] - test results are no longer available on Hadoop. Investigating, but I 
suspect it may be because of this jira.

> Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
> -
>
> Key: HIVE-14734
> URL: https://issues.apache.org/jira/browse/HIVE-14734
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 2.2.0
>
> Attachments: HIVE-14734.2.patch, HIVE-14734.patch
>
>
> NO PRECOMMIT TESTS
> Currently, to execute tests on a new branch, a manual process must be done:
> 1. Create a new Jenkins job with the new branch name
> 2. Create a patch to jenkins-submit-build.sh with the new branch
> 3. Create a profile properties file on the ptest master with the new branch
> This jira will attempt to automate steps 1 and 2 by detecting the branch 
> profile from a patch to test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589

2016-09-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504607#comment-15504607
 ] 

Hive QA commented on HIVE-14680:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829250/HIVE-14680.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10554 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1232/testReport
Console output: 
https://builds.apache.org/job/jenkins-PreCommit-HIVE-Build/1232/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/jenkins-PreCommit-HIVE-Build-1232/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829250 - jenkins-PreCommit-HIVE-Build

> retain consistent splits /during/ (as opposed to across) LLAP failures on top 
> of HIVE-14589
> ---
>
> Key: HIVE-14680
> URL: https://issues.apache.org/jira/browse/HIVE-14680
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, 
> HIVE-14680.03.patch, HIVE-14680.patch
>
>
> see HIVE-14589.
> Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) 
> is to return locations for all slots to HostAffinitySplitLocationProvider, 
> the missing slots being inactive locations (based solely on the last slot 
> actually present). For the splits mapped to these locations, fall back via 
> different hash functions, or some sort of probing.
> This still doesn't handle all the cases, namely when the last slots are gone 
> (consistent hashing is supposed to be good for this?); however for that we'd 
> need more involved coordination between nodes or a central updater to 
> indicate the number of nodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14461) Investigate HBaseMinimrCliDriver tests

2016-09-19 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504604#comment-15504604
 ] 

Siddharth Seth commented on HIVE-14461:
---

Ping [~prasanth_j] for review.

> Investigate HBaseMinimrCliDriver tests
> --
>
> Key: HIVE-14461
> URL: https://issues.apache.org/jira/browse/HIVE-14461
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
> Attachments: HIVE-14461.01.patch
>
>
> during HIVE-1 i've encountered an odd thing:
> HBaseMinimrCliDriver only executes single test...and that test is set using 
> the qfile selector...which looks a out-of-place.
> The only test it executes doesn't follow regular qtest file naming...and has 
> an extension 'm'
> At least the file should be renamedbut I think change wasn't 
> intentional



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14788) Investigate how to access permanent function with restarting HS2 if load balancer is configured

2016-09-19 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504515#comment-15504515
 ] 

BELUGA BEHR commented on HIVE-14788:


The current option, as I understand it:

# Copy target JAR file to each host into Hive auxiliary directory
# Restart one HiveServer2 instance
# Connect directly to that one HiveServer2 instance
# Create the function
# Restart the rest of the HiveServer2 instances

Restarting the rest of the HiveServer2 instances will cause them to pick up the 
new JAR file from the auxiliary directory and also re-load the list of 
functions from the HMS.

This can perhaps be improved with a read-through cache for functions, with a 
timeout for each entry (TTL).  When a function is encountered, if it is not in 
the cache, HS2 can attempt to grab the function information from the HMS.  The 
timeout is important because if a user is to drop a function, that function 
needs a way to be dropped from the HS2 caches.

With this setup, the process for adding a function is simplified:

# Copy target JAR file to each host into Hive auxiliary directory
# Restart all HiveServer2 instances
# Connect to any HiveServer2 instance through the load balancer
# Create the function


> Investigate how to access permanent function with restarting HS2 if load 
> balancer is configured
> ---
>
> Key: HIVE-14788
> URL: https://issues.apache.org/jira/browse/HIVE-14788
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> When load balancer is configured for multiple HS2 servers, seems we need to 
> restart each HS2 server to get permanent function to work. Since the command 
> "reload function" issued from the client to refresh the global registry may 
> is not targeted to a specific HS2 server, some servers may not get refreshed 
> and ClassNotFoundException may be thrown later.
> Investigate if it's an issue and a good solution for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13703) "msck repair" on table with non-partition subdirectories reporting partitions not in metastore

2016-09-19 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504433#comment-15504433
 ] 

Sergey Shelukhin commented on HIVE-13703:
-

Would this be fixed by HIVE-14511, or otherwise should it use a similar 
approach (looking for the expected directory structure in the first place, 
rather than catching errors)?

> "msck repair" on table with non-partition subdirectories reporting partitions 
> not in metastore
> --
>
> Key: HIVE-13703
> URL: https://issues.apache.org/jira/browse/HIVE-13703
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.2.1
>Reporter: Ana Gillan
>Assignee: Alina Abramova
> Attachments: HIVE-13703.patch
>
>
> PROBLEM: Subdirectories created with UNION ALL are listed in {{show 
> partitions}} output, but show up as {{Partitions not in metastore}} in {{msck 
> repair}} output. 
> STEPS TO REPRODUCE: Table created from {{CTAS ... UNION ALL}} DDL
> {code}
> hive> msck repair table meter_001; 
> OK 
> Partitions not in metastore: meter_001:tech_datestamp=2016-03-09/1 
> meter_001:tech_datestamp=2016-03-09/2 meter_001:tech_datestamp=2016-03-10/1 
> meter_001:tech_datestamp=2016-03-10/2 meter_001:tech_datestamp=2016-03-11/1 
> meter_001:tech_datestamp=2016-03-11/2 meter_001:tech_datestamp=2016-03-12/1 
> meter_001:tech_datestamp=2016-03-12/2 meter_001:tech_datestamp=2016-03-13/1 
> meter_001:tech_datestamp=2016-03-13/2 meter_001:tech_datestamp=2016-03-14/1 
> meter_001:tech_datestamp=2016-03-14/2 meter_001:tech_datestamp=2016-03-15/1 
> meter_001:tech_datestamp=2016-03-15/2 meter_001:tech_datestamp=2016-03-16/1 
> meter_001:tech_datestamp=2016-03-16/2 meter_001:tech_datestamp=2016-03-17/1 
> meter_001:tech_datestamp=2016-03-17/2 meter_001:tech_datestamp=2016-03-18/1 
> meter_001:tech_datestamp=2016-03-18/2 meter_001:tech_datestamp=2016-03-19/1 
> meter_001:tech_datestamp=2016-03-19/2 meter_001:tech_datestamp=2016-03-20/1 
> meter_001:tech_datestamp=2016-03-20/2 meter_001:tech_datestamp=2016-03-21/1 
> meter_001:tech_datestamp=2016-03-21/2 meter_001:tech_datestamp=2016-03-22/1 
> meter_001:tech_datestamp=2016-03-22/2 meter_001:tech_datestamp=2016-03-23/1 
> meter_001:tech_datestamp=2016-03-23/2 meter_001:tech_datestamp=2016-03-24/1 
> meter_001:tech_datestamp=2016-03-24/2 meter_001:tech_datestamp=2016-03-25/1 
> meter_001:tech_datestamp=2016-03-25/2 meter_001:tech_datestamp=2016-03-26/1 
> meter_001:tech_datestamp=2016-03-26/2 meter_001:tech_datestamp=2016-03-27/1 
> meter_001:tech_datestamp=2016-03-27/2 meter_001:tech_datestamp=2016-03-28/1 
> meter_001:tech_datestamp=2016-03-28/2 meter_001:tech_datestamp=2016-03-29/1 
> meter_001:tech_datestamp=2016-03-29/2 meter_001:tech_datestamp=2016-03-30/1 
> meter_001:tech_datestamp=2016-03-30/2 meter_001:tech_datestamp=2016-03-31/1 
> meter_001:tech_datestamp=2016-03-31/2 meter_001:tech_datestamp=2016-04-01/1 
> meter_001:tech_datestamp=2016-04-01/2 meter_001:tech_datestamp=2016-04-02/1 
> meter_001:tech_datestamp=2016-04-02/2 meter_001:tech_datestamp=2016-04-03/1 
> meter_001:tech_datestamp=2016-04-03/2 meter_001:tech_datestamp=2016-04-04/1 
> meter_001:tech_datestamp=2016-04-04/2 meter_001:tech_datestamp=2016-04-05/1 
> meter_001:tech_datestamp=2016-04-05/2 meter_001:tech_datestamp=2016-04-06/1 
> meter_001:tech_datestamp=2016-04-06/2 
> Time taken: 15.996 seconds, Fetched: 1 row(s) 
> {code}
> {code}
> hive> show partitions meter_001; 
> OK 
> tech_datestamp=2016-03-09 
> tech_datestamp=2016-03-10 
> tech_datestamp=2016-03-11 
> tech_datestamp=2016-03-12 
> tech_datestamp=2016-03-13 
> tech_datestamp=2016-03-14 
> tech_datestamp=2016-03-15 
> tech_datestamp=2016-03-16 
> tech_datestamp=2016-03-17 
> tech_datestamp=2016-03-18 
> tech_datestamp=2016-03-19 
> tech_datestamp=2016-03-20 
> tech_datestamp=2016-03-21 
> tech_datestamp=2016-03-22 
> tech_datestamp=2016-03-23 
> tech_datestamp=2016-03-24 
> tech_datestamp=2016-03-25 
> tech_datestamp=2016-03-26 
> tech_datestamp=2016-03-27 
> tech_datestamp=2016-03-28 
> tech_datestamp=2016-03-29 
> tech_datestamp=2016-03-30 
> tech_datestamp=2016-03-31 
> tech_datestamp=2016-04-01 
> tech_datestamp=2016-04-02 
> tech_datestamp=2016-04-03 
> tech_datestamp=2016-04-04 
> tech_datestamp=2016-04-05 
> tech_datestamp=2016-04-06 
> Time taken: 0.417 seconds, Fetched: 29 row(s) 
> {code}
> Ideally msck repair should ignore subdirectory if that additional partition 
> column doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-09-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Target Version/s: 1.3.0, 2.1.1, 2.0.2

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-14251.1.patch, HIVE-14251.2.patch, 
> HIVE-14251.3.patch, HIVE-14251.4.patch, HIVE-14251.5.patch, HIVE-14251.6.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14251) Union All of different types resolves to incorrect data

2016-09-19 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504407#comment-15504407
 ] 

Aihua Xu commented on HIVE-14251:
-

Yes. It should affect all of them. I will backport to those branches.

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-14251.1.patch, HIVE-14251.2.patch, 
> HIVE-14251.3.patch, HIVE-14251.4.patch, HIVE-14251.5.patch, HIVE-14251.6.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on

2016-09-19 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504358#comment-15504358
 ] 

Sergey Shelukhin commented on HIVE-14783:
-

[~ashutoshc] what is the effect of this bugfix on queries (i.e. what is the 
user-observable behavior that it fixes)?

> bucketing column should be part of sorting for delete/update operation when 
> spdo is on
> --
>
> Key: HIVE-14783
> URL: https://issues.apache.org/jira/browse/HIVE-14783
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Transactions
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-14783.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14251) Union All of different types resolves to incorrect data

2016-09-19 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504346#comment-15504346
 ] 

Sergey Shelukhin commented on HIVE-14251:
-

Does this affect branch-1? Having incorrect results should warrant a fix in all 
the branches (2.1, 2.0, 1.3?)

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-14251.1.patch, HIVE-14251.2.patch, 
> HIVE-14251.3.patch, HIVE-14251.4.patch, HIVE-14251.5.patch, HIVE-14251.6.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589

2016-09-19 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504329#comment-15504329
 ] 

Sergey Shelukhin commented on HIVE-14680:
-

One-byte change.

> retain consistent splits /during/ (as opposed to across) LLAP failures on top 
> of HIVE-14589
> ---
>
> Key: HIVE-14680
> URL: https://issues.apache.org/jira/browse/HIVE-14680
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, 
> HIVE-14680.03.patch, HIVE-14680.patch
>
>
> see HIVE-14589.
> Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) 
> is to return locations for all slots to HostAffinitySplitLocationProvider, 
> the missing slots being inactive locations (based solely on the last slot 
> actually present). For the splits mapped to these locations, fall back via 
> different hash functions, or some sort of probing.
> This still doesn't handle all the cases, namely when the last slots are gone 
> (consistent hashing is supposed to be good for this?); however for that we'd 
> need more involved coordination between nodes or a central updater to 
> indicate the number of nodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589

2016-09-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14680:

Attachment: HIVE-14680.03.patch

> retain consistent splits /during/ (as opposed to across) LLAP failures on top 
> of HIVE-14589
> ---
>
> Key: HIVE-14680
> URL: https://issues.apache.org/jira/browse/HIVE-14680
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, 
> HIVE-14680.03.patch, HIVE-14680.patch
>
>
> see HIVE-14589.
> Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) 
> is to return locations for all slots to HostAffinitySplitLocationProvider, 
> the missing slots being inactive locations (based solely on the last slot 
> actually present). For the splits mapped to these locations, fall back via 
> different hash functions, or some sort of probing.
> This still doesn't handle all the cases, namely when the last slots are gone 
> (consistent hashing is supposed to be good for this?); however for that we'd 
> need more involved coordination between nodes or a central updater to 
> indicate the number of nodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589

2016-09-19 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504324#comment-15504324
 ] 

Sergey Shelukhin commented on HIVE-14680:
-

[~sseth] I was assuming the normal block/split boundaries were k*large-ish 
power of two, so this would suffice.
Apparently there's no such restriction. +-3 can affect another bit, however if 
we make no assumptions about split boundaries, we cannot tell which way the 3 
goes (e.g. for 31323, we don't know if it has to be consistent with 31320 or 
31326). I guess we can just remove an extra bit.

> retain consistent splits /during/ (as opposed to across) LLAP failures on top 
> of HIVE-14589
> ---
>
> Key: HIVE-14680
> URL: https://issues.apache.org/jira/browse/HIVE-14680
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, 
> HIVE-14680.patch
>
>
> see HIVE-14589.
> Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) 
> is to return locations for all slots to HostAffinitySplitLocationProvider, 
> the missing slots being inactive locations (based solely on the last slot 
> actually present). For the splits mapped to these locations, fall back via 
> different hash functions, or some sort of probing.
> This still doesn't handle all the cases, namely when the last slots are gone 
> (consistent hashing is supposed to be good for this?); however for that we'd 
> need more involved coordination between nodes or a central updater to 
> indicate the number of nodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14734) Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh

2016-09-19 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504294#comment-15504294
 ] 

Siddharth Seth commented on HIVE-14734:
---

Very useful. Thankyou. 
https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/ is the 
new url for the job. (instead of PreCommit-HIVE-master-Build)

> Detect ptest profile and submit to ptest-server from jenkins-execute-build.sh
> -
>
> Key: HIVE-14734
> URL: https://issues.apache.org/jira/browse/HIVE-14734
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 2.2.0
>
> Attachments: HIVE-14734.2.patch, HIVE-14734.patch
>
>
> NO PRECOMMIT TESTS
> Currently, to execute tests on a new branch, a manual process must be done:
> 1. Create a new Jenkins job with the new branch name
> 2. Create a patch to jenkins-submit-build.sh with the new branch
> 3. Create a profile properties file on the ptest master with the new branch
> This jira will attempt to automate steps 1 and 2 by detecting the branch 
> profile from a patch to test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589

2016-09-19 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504305#comment-15504305
 ] 

Gopal V commented on HIVE-14680:


The off-by value is usually the file magic for the ORC file "ORC" (3 bytes). 
BISplit will ignore it & do (0+32Mb), the ETLSplit will start at the 1st stripe 
(3+33.99Mb).

This is not expected to happen for any split other than stripe #1 of a file.

> retain consistent splits /during/ (as opposed to across) LLAP failures on top 
> of HIVE-14589
> ---
>
> Key: HIVE-14680
> URL: https://issues.apache.org/jira/browse/HIVE-14680
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, 
> HIVE-14680.patch
>
>
> see HIVE-14589.
> Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) 
> is to return locations for all slots to HostAffinitySplitLocationProvider, 
> the missing slots being inactive locations (based solely on the last slot 
> actually present). For the splits mapped to these locations, fall back via 
> different hash functions, or some sort of probing.
> This still doesn't handle all the cases, namely when the last slots are gone 
> (consistent hashing is supposed to be good for this?); however for that we'd 
> need more involved coordination between nodes or a central updater to 
> indicate the number of nodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589

2016-09-19 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504287#comment-15504287
 ] 

Siddharth Seth commented on HIVE-14680:
---

bq. As for removing the 2 lowest bits, yes
Let me clarify the question.
Block boundary is 30MB. Split via read-footers generates the split start 30MB + 
3 bytes. Split without reading footer generates the start-offset as 30MB.
Will removing the 2 lower bits provide the same start offset for both splits. 
Otherwise these splits are not consistent, and will not go to the same node. 
30MB is probably a bad example. Will this work in all cases (32MB -2 bytes).

> retain consistent splits /during/ (as opposed to across) LLAP failures on top 
> of HIVE-14589
> ---
>
> Key: HIVE-14680
> URL: https://issues.apache.org/jira/browse/HIVE-14680
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, 
> HIVE-14680.patch
>
>
> see HIVE-14589.
> Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) 
> is to return locations for all slots to HostAffinitySplitLocationProvider, 
> the missing slots being inactive locations (based solely on the last slot 
> actually present). For the splits mapped to these locations, fall back via 
> different hash functions, or some sort of probing.
> This still doesn't handle all the cases, namely when the last slots are gone 
> (consistent hashing is supposed to be good for this?); however for that we'd 
> need more involved coordination between nodes or a central updater to 
> indicate the number of nodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP

2016-09-19 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504277#comment-15504277
 ] 

Prasanth Jayachandran commented on HIVE-14651:
--

+1

> Add a local cluster for Tez and LLAP
> 
>
> Key: HIVE-14651
> URL: https://issues.apache.org/jira/browse/HIVE-14651
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, 
> HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP

2016-09-19 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504276#comment-15504276
 ] 

Prasanth Jayachandran commented on HIVE-14651:
--

That's the one. Makes sense.

> Add a local cluster for Tez and LLAP
> 
>
> Key: HIVE-14651
> URL: https://issues.apache.org/jira/browse/HIVE-14651
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, 
> HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP

2016-09-19 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504272#comment-15504272
 ] 

Siddharth Seth commented on HIVE-14651:
---

Which comment are you referring to? (The one which says "Force fs to file://, 
setup staging dir?)

The staging dir is setup within the Hive directories, otherwise Tez takes care 
of creating the dirs based on configuration. Once local mode works properly, 
these comments will get resolved (I'd like to allow local mode to work with 
either file:// or hdfs. Tez does not support hdfs with local mode yet)

> Add a local cluster for Tez and LLAP
> 
>
> Key: HIVE-14651
> URL: https://issues.apache.org/jira/browse/HIVE-14651
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, 
> HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14781) ptest killall command does not work

2016-09-19 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14781:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review [~prasanth_j]


> ptest killall command does not work
> ---
>
> Key: HIVE-14781
> URL: https://issues.apache.org/jira/browse/HIVE-14781
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.2.0
>
> Attachments: HIVE-14781.01.patch
>
>
> killall -f is not a valid flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP

2016-09-19 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504223#comment-15504223
 ] 

Prasanth Jayachandran commented on HIVE-14651:
--

[~sseth] regarding your comments in the code. how does it work without setting 
up staging dir? Is there a default that tez AM sets up?

> Add a local cluster for Tez and LLAP
> 
>
> Key: HIVE-14651
> URL: https://issues.apache.org/jira/browse/HIVE-14651
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, 
> HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14651) Add a local cluster for Tez and LLAP

2016-09-19 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504202#comment-15504202
 ] 

Siddharth Seth commented on HIVE-14651:
---

[~prasanth_j], [~sershe] - the test results look good now. Any other comments?

> Add a local cluster for Tez and LLAP
> 
>
> Key: HIVE-14651
> URL: https://issues.apache.org/jira/browse/HIVE-14651
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, 
> HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on

2016-09-19 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14783:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master.

> bucketing column should be part of sorting for delete/update operation when 
> spdo is on
> --
>
> Key: HIVE-14783
> URL: https://issues.apache.org/jira/browse/HIVE-14783
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Transactions
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-14783.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on

2016-09-19 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504005#comment-15504005
 ] 

Prasanth Jayachandran commented on HIVE-14783:
--

+1

> bucketing column should be part of sorting for delete/update operation when 
> spdo is on
> --
>
> Key: HIVE-14783
> URL: https://issues.apache.org/jira/browse/HIVE-14783
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Transactions
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-14783.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14784) Operation logs are disabled automatically if the parent directory does not exist.

2016-09-19 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503888#comment-15503888
 ] 

Yongzhi Chen commented on HIVE-14784:
-

Had a discussion with Naveen, He will add warning related to old log files no 
long exist.

> Operation logs are disabled automatically if the parent directory does not 
> exist.
> -
>
> Key: HIVE-14784
> URL: https://issues.apache.org/jira/browse/HIVE-14784
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-14784.patch
>
>
> Operation logging is disabled automatically for the query if for some reason 
> the parent directory (named after the hive session id) that gets created when 
> the session is established gets deleted (for any reason). For ex: if the 
> operation logdir is /tmp which automatically can get purged at a configured 
> interval by the OS.
> Running a query from that session leads to
> {code}
> 2016-09-15 15:09:16,723 WARN org.apache.hive.service.cli.operation.Operation: 
> Unable to create operation log file: 
> /tmp/hive/operation_logs/b8809985-6b38-47ec-a49b-6158a67cd9fc/d35414f7-2418-426c-8489-c6f643ca4599
> java.io.IOException: No such file or directory
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hive.service.cli.operation.Operation.createOperationLog(Operation.java:195)
>   at 
> org.apache.hive.service.cli.operation.Operation.beforeRun(Operation.java:237)
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:255)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:398)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:385)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:490)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This later leads to errors like (more prominent when using HUE as HUE does 
> not close hive sessions and attempts to retrieve the operations logs days 
> after they were created).
> {code}
> WARN org.apache.hive.service.cli.thrift.ThriftCLIService: Error fetching 
> results: 
> org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated 
> with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, 
> getHandleIdentifier()=d35414f7-2418-426c-8489-c6f643ca4599]
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:259)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:701)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:676)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)

[jira] [Updated] (HIVE-13703) "msck repair" on table with non-partition subdirectories reporting partitions not in metastore

2016-09-19 Thread Alina Abramova (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alina Abramova updated HIVE-13703:
--
Attachment: HIVE-13703.patch

This patch fixes this issue. Could somebody review  it?

> "msck repair" on table with non-partition subdirectories reporting partitions 
> not in metastore
> --
>
> Key: HIVE-13703
> URL: https://issues.apache.org/jira/browse/HIVE-13703
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.2.1
>Reporter: Ana Gillan
>Assignee: Alina Abramova
> Attachments: HIVE-13703.patch
>
>
> PROBLEM: Subdirectories created with UNION ALL are listed in {{show 
> partitions}} output, but show up as {{Partitions not in metastore}} in {{msck 
> repair}} output. 
> STEPS TO REPRODUCE: Table created from {{CTAS ... UNION ALL}} DDL
> {code}
> hive> msck repair table meter_001; 
> OK 
> Partitions not in metastore: meter_001:tech_datestamp=2016-03-09/1 
> meter_001:tech_datestamp=2016-03-09/2 meter_001:tech_datestamp=2016-03-10/1 
> meter_001:tech_datestamp=2016-03-10/2 meter_001:tech_datestamp=2016-03-11/1 
> meter_001:tech_datestamp=2016-03-11/2 meter_001:tech_datestamp=2016-03-12/1 
> meter_001:tech_datestamp=2016-03-12/2 meter_001:tech_datestamp=2016-03-13/1 
> meter_001:tech_datestamp=2016-03-13/2 meter_001:tech_datestamp=2016-03-14/1 
> meter_001:tech_datestamp=2016-03-14/2 meter_001:tech_datestamp=2016-03-15/1 
> meter_001:tech_datestamp=2016-03-15/2 meter_001:tech_datestamp=2016-03-16/1 
> meter_001:tech_datestamp=2016-03-16/2 meter_001:tech_datestamp=2016-03-17/1 
> meter_001:tech_datestamp=2016-03-17/2 meter_001:tech_datestamp=2016-03-18/1 
> meter_001:tech_datestamp=2016-03-18/2 meter_001:tech_datestamp=2016-03-19/1 
> meter_001:tech_datestamp=2016-03-19/2 meter_001:tech_datestamp=2016-03-20/1 
> meter_001:tech_datestamp=2016-03-20/2 meter_001:tech_datestamp=2016-03-21/1 
> meter_001:tech_datestamp=2016-03-21/2 meter_001:tech_datestamp=2016-03-22/1 
> meter_001:tech_datestamp=2016-03-22/2 meter_001:tech_datestamp=2016-03-23/1 
> meter_001:tech_datestamp=2016-03-23/2 meter_001:tech_datestamp=2016-03-24/1 
> meter_001:tech_datestamp=2016-03-24/2 meter_001:tech_datestamp=2016-03-25/1 
> meter_001:tech_datestamp=2016-03-25/2 meter_001:tech_datestamp=2016-03-26/1 
> meter_001:tech_datestamp=2016-03-26/2 meter_001:tech_datestamp=2016-03-27/1 
> meter_001:tech_datestamp=2016-03-27/2 meter_001:tech_datestamp=2016-03-28/1 
> meter_001:tech_datestamp=2016-03-28/2 meter_001:tech_datestamp=2016-03-29/1 
> meter_001:tech_datestamp=2016-03-29/2 meter_001:tech_datestamp=2016-03-30/1 
> meter_001:tech_datestamp=2016-03-30/2 meter_001:tech_datestamp=2016-03-31/1 
> meter_001:tech_datestamp=2016-03-31/2 meter_001:tech_datestamp=2016-04-01/1 
> meter_001:tech_datestamp=2016-04-01/2 meter_001:tech_datestamp=2016-04-02/1 
> meter_001:tech_datestamp=2016-04-02/2 meter_001:tech_datestamp=2016-04-03/1 
> meter_001:tech_datestamp=2016-04-03/2 meter_001:tech_datestamp=2016-04-04/1 
> meter_001:tech_datestamp=2016-04-04/2 meter_001:tech_datestamp=2016-04-05/1 
> meter_001:tech_datestamp=2016-04-05/2 meter_001:tech_datestamp=2016-04-06/1 
> meter_001:tech_datestamp=2016-04-06/2 
> Time taken: 15.996 seconds, Fetched: 1 row(s) 
> {code}
> {code}
> hive> show partitions meter_001; 
> OK 
> tech_datestamp=2016-03-09 
> tech_datestamp=2016-03-10 
> tech_datestamp=2016-03-11 
> tech_datestamp=2016-03-12 
> tech_datestamp=2016-03-13 
> tech_datestamp=2016-03-14 
> tech_datestamp=2016-03-15 
> tech_datestamp=2016-03-16 
> tech_datestamp=2016-03-17 
> tech_datestamp=2016-03-18 
> tech_datestamp=2016-03-19 
> tech_datestamp=2016-03-20 
> tech_datestamp=2016-03-21 
> tech_datestamp=2016-03-22 
> tech_datestamp=2016-03-23 
> tech_datestamp=2016-03-24 
> tech_datestamp=2016-03-25 
> tech_datestamp=2016-03-26 
> tech_datestamp=2016-03-27 
> tech_datestamp=2016-03-28 
> tech_datestamp=2016-03-29 
> tech_datestamp=2016-03-30 
> tech_datestamp=2016-03-31 
> tech_datestamp=2016-04-01 
> tech_datestamp=2016-04-02 
> tech_datestamp=2016-04-03 
> tech_datestamp=2016-04-04 
> tech_datestamp=2016-04-05 
> tech_datestamp=2016-04-06 
> Time taken: 0.417 seconds, Fetched: 29 row(s) 
> {code}
> Ideally msck repair should ignore subdirectory if that additional partition 
> column doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14568) Hive Decimal Returns NULL

2016-09-19 Thread Akhil Chalamalasetty (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503847#comment-15503847
 ] 

Akhil Chalamalasetty edited comment on HIVE-14568 at 9/19/16 3:58 PM:
--

Thanks for the elaborate explanation Zhang. We will workaround this issue by 
casting the column to a lower precision & scale. 
Since we have a few developers migrating from ORACLE and Postgres SQL, we 
thought this would be a feature request to ease the usage of Hive. Please let 
us know if there is a way to introduce such a mode on Hive and if that would 
have a any performance impacts once implemented.

Regards,
Akhil


was (Author: akhilnaidu):
Thanks Zhang. We will workaround this issue by casting the column to a lower 
precision & scale. 
Since we have a few developers migrating from ORACLE and Postgres SQL, we 
thought this would be a feature request to ease the usage of Hive. Please let 
us know if there is a way to introduce such a mode on Hive and if that would 
have a any performance impacts once implemented.

Regards,
AKhil

> Hive Decimal Returns NULL
> -
>
> Key: HIVE-14568
> URL: https://issues.apache.org/jira/browse/HIVE-14568
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0
> Environment: Centos 6.7, Hadoop 2.7.2,hive 1.0.0,2.0
>Reporter: gurmukh singh
>Assignee: Xuefu Zhang
>
> Hi
> I was under the impression that the bug: 
> https://issues.apache.org/jira/browse/HIVE-5022 got fixed. But, I see the 
> same issue in Hive 1.0 and hive 1.2 as well.
> hive> desc mul_table;
> OK
> prc   decimal(38,28)
> vol   decimal(38,10)
> Time taken: 0.068 seconds, Fetched: 2 row(s)
> hive> select prc, vol, prc*vol as cost from mul_table;
> OK
> 1.2   200 NULL
> 1.44  200 NULL
> 2.14  100 NULL
> 3.004 50  NULL
> 1.2   200 NULL
> Time taken: 0.048 seconds, Fetched: 5 row(s)
> Rather then returning NULL, it should give error or round off.
> I understand that, I can use Double instead of decimal or can cast it, but 
> still returning "Null" will make many things go unnoticed.
> hive> desc mul_table2;
> OK
> prc   double
> vol   decimal(14,10)
> Time taken: 0.049 seconds, Fetched: 2 row(s)
> hive> select * from mul_table2;
> OK
> 1.4   200
> 1.34  200
> 7.34  100
> 7454533.354544100
> Time taken: 0.028 seconds, Fetched: 4 row(s)
> hive> select prc, vol, prc*vol  as cost from mul_table3;
> OK
> 7.34  100 734.0
> 7.34  10007340.0
> 1.000410001000.4
> 7454533.354544100 7.454533354544E8   <- Wrong result
> 7454533.35454410007.454533354544E9   <- Wrong result
> Time taken: 0.025 seconds, Fetched: 5 row(s)
> Casting:
> hive> select prc, vol, cast(prc*vol as decimal(38,38)) as cost from 
> mul_table3;
> OK
> 7.34  100 NULL
> 7.34  1000NULL
> 1.00041000NULL
> 7454533.354544100 NULL
> 7454533.3545441000NULL
> Time taken: 0.033 seconds, Fetched: 5 row(s)
> hive> select prc, vol, cast(prc*vol as decimal(38,10)) as cost from 
> mul_table3;
> OK
> 7.34  100 734
> 7.34  10007340
> 1.000410001000.4
> 7454533.354544100 745453335.4544
> 7454533.35454410007454533354.544
> Time taken: 0.026 seconds, Fetched: 5 row(s) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14568) Hive Decimal Returns NULL

2016-09-19 Thread Akhil Chalamalasetty (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503847#comment-15503847
 ] 

Akhil Chalamalasetty commented on HIVE-14568:
-

Thanks Zhang. We will workaround this issue by casting the column to a lower 
precision & scale. 
Since we have a few developers migrating from ORACLE and Postgres SQL, we 
thought this would be a feature request to ease the usage of Hive. Please let 
us know if there is a way to introduce such a mode on Hive and if that would 
have a any performance impacts once implemented.

Regards,
AKhil

> Hive Decimal Returns NULL
> -
>
> Key: HIVE-14568
> URL: https://issues.apache.org/jira/browse/HIVE-14568
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0
> Environment: Centos 6.7, Hadoop 2.7.2,hive 1.0.0,2.0
>Reporter: gurmukh singh
>Assignee: Xuefu Zhang
>
> Hi
> I was under the impression that the bug: 
> https://issues.apache.org/jira/browse/HIVE-5022 got fixed. But, I see the 
> same issue in Hive 1.0 and hive 1.2 as well.
> hive> desc mul_table;
> OK
> prc   decimal(38,28)
> vol   decimal(38,10)
> Time taken: 0.068 seconds, Fetched: 2 row(s)
> hive> select prc, vol, prc*vol as cost from mul_table;
> OK
> 1.2   200 NULL
> 1.44  200 NULL
> 2.14  100 NULL
> 3.004 50  NULL
> 1.2   200 NULL
> Time taken: 0.048 seconds, Fetched: 5 row(s)
> Rather then returning NULL, it should give error or round off.
> I understand that, I can use Double instead of decimal or can cast it, but 
> still returning "Null" will make many things go unnoticed.
> hive> desc mul_table2;
> OK
> prc   double
> vol   decimal(14,10)
> Time taken: 0.049 seconds, Fetched: 2 row(s)
> hive> select * from mul_table2;
> OK
> 1.4   200
> 1.34  200
> 7.34  100
> 7454533.354544100
> Time taken: 0.028 seconds, Fetched: 4 row(s)
> hive> select prc, vol, prc*vol  as cost from mul_table3;
> OK
> 7.34  100 734.0
> 7.34  10007340.0
> 1.000410001000.4
> 7454533.354544100 7.454533354544E8   <- Wrong result
> 7454533.35454410007.454533354544E9   <- Wrong result
> Time taken: 0.025 seconds, Fetched: 5 row(s)
> Casting:
> hive> select prc, vol, cast(prc*vol as decimal(38,38)) as cost from 
> mul_table3;
> OK
> 7.34  100 NULL
> 7.34  1000NULL
> 1.00041000NULL
> 7454533.354544100 NULL
> 7454533.3545441000NULL
> Time taken: 0.033 seconds, Fetched: 5 row(s)
> hive> select prc, vol, cast(prc*vol as decimal(38,10)) as cost from 
> mul_table3;
> OK
> 7.34  100 734
> 7.34  10007340
> 1.000410001000.4
> 7454533.354544100 745453335.4544
> 7454533.35454410007454533354.544
> Time taken: 0.026 seconds, Fetched: 5 row(s) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-13703) "msck repair" on table with non-partition subdirectories reporting partitions not in metastore

2016-09-19 Thread Alina Abramova (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alina Abramova reassigned HIVE-13703:
-

Assignee: Alina Abramova

> "msck repair" on table with non-partition subdirectories reporting partitions 
> not in metastore
> --
>
> Key: HIVE-13703
> URL: https://issues.apache.org/jira/browse/HIVE-13703
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.0.0, 1.2.1
>Reporter: Ana Gillan
>Assignee: Alina Abramova
>
> PROBLEM: Subdirectories created with UNION ALL are listed in {{show 
> partitions}} output, but show up as {{Partitions not in metastore}} in {{msck 
> repair}} output. 
> STEPS TO REPRODUCE: Table created from {{CTAS ... UNION ALL}} DDL
> {code}
> hive> msck repair table meter_001; 
> OK 
> Partitions not in metastore: meter_001:tech_datestamp=2016-03-09/1 
> meter_001:tech_datestamp=2016-03-09/2 meter_001:tech_datestamp=2016-03-10/1 
> meter_001:tech_datestamp=2016-03-10/2 meter_001:tech_datestamp=2016-03-11/1 
> meter_001:tech_datestamp=2016-03-11/2 meter_001:tech_datestamp=2016-03-12/1 
> meter_001:tech_datestamp=2016-03-12/2 meter_001:tech_datestamp=2016-03-13/1 
> meter_001:tech_datestamp=2016-03-13/2 meter_001:tech_datestamp=2016-03-14/1 
> meter_001:tech_datestamp=2016-03-14/2 meter_001:tech_datestamp=2016-03-15/1 
> meter_001:tech_datestamp=2016-03-15/2 meter_001:tech_datestamp=2016-03-16/1 
> meter_001:tech_datestamp=2016-03-16/2 meter_001:tech_datestamp=2016-03-17/1 
> meter_001:tech_datestamp=2016-03-17/2 meter_001:tech_datestamp=2016-03-18/1 
> meter_001:tech_datestamp=2016-03-18/2 meter_001:tech_datestamp=2016-03-19/1 
> meter_001:tech_datestamp=2016-03-19/2 meter_001:tech_datestamp=2016-03-20/1 
> meter_001:tech_datestamp=2016-03-20/2 meter_001:tech_datestamp=2016-03-21/1 
> meter_001:tech_datestamp=2016-03-21/2 meter_001:tech_datestamp=2016-03-22/1 
> meter_001:tech_datestamp=2016-03-22/2 meter_001:tech_datestamp=2016-03-23/1 
> meter_001:tech_datestamp=2016-03-23/2 meter_001:tech_datestamp=2016-03-24/1 
> meter_001:tech_datestamp=2016-03-24/2 meter_001:tech_datestamp=2016-03-25/1 
> meter_001:tech_datestamp=2016-03-25/2 meter_001:tech_datestamp=2016-03-26/1 
> meter_001:tech_datestamp=2016-03-26/2 meter_001:tech_datestamp=2016-03-27/1 
> meter_001:tech_datestamp=2016-03-27/2 meter_001:tech_datestamp=2016-03-28/1 
> meter_001:tech_datestamp=2016-03-28/2 meter_001:tech_datestamp=2016-03-29/1 
> meter_001:tech_datestamp=2016-03-29/2 meter_001:tech_datestamp=2016-03-30/1 
> meter_001:tech_datestamp=2016-03-30/2 meter_001:tech_datestamp=2016-03-31/1 
> meter_001:tech_datestamp=2016-03-31/2 meter_001:tech_datestamp=2016-04-01/1 
> meter_001:tech_datestamp=2016-04-01/2 meter_001:tech_datestamp=2016-04-02/1 
> meter_001:tech_datestamp=2016-04-02/2 meter_001:tech_datestamp=2016-04-03/1 
> meter_001:tech_datestamp=2016-04-03/2 meter_001:tech_datestamp=2016-04-04/1 
> meter_001:tech_datestamp=2016-04-04/2 meter_001:tech_datestamp=2016-04-05/1 
> meter_001:tech_datestamp=2016-04-05/2 meter_001:tech_datestamp=2016-04-06/1 
> meter_001:tech_datestamp=2016-04-06/2 
> Time taken: 15.996 seconds, Fetched: 1 row(s) 
> {code}
> {code}
> hive> show partitions meter_001; 
> OK 
> tech_datestamp=2016-03-09 
> tech_datestamp=2016-03-10 
> tech_datestamp=2016-03-11 
> tech_datestamp=2016-03-12 
> tech_datestamp=2016-03-13 
> tech_datestamp=2016-03-14 
> tech_datestamp=2016-03-15 
> tech_datestamp=2016-03-16 
> tech_datestamp=2016-03-17 
> tech_datestamp=2016-03-18 
> tech_datestamp=2016-03-19 
> tech_datestamp=2016-03-20 
> tech_datestamp=2016-03-21 
> tech_datestamp=2016-03-22 
> tech_datestamp=2016-03-23 
> tech_datestamp=2016-03-24 
> tech_datestamp=2016-03-25 
> tech_datestamp=2016-03-26 
> tech_datestamp=2016-03-27 
> tech_datestamp=2016-03-28 
> tech_datestamp=2016-03-29 
> tech_datestamp=2016-03-30 
> tech_datestamp=2016-03-31 
> tech_datestamp=2016-04-01 
> tech_datestamp=2016-04-02 
> tech_datestamp=2016-04-03 
> tech_datestamp=2016-04-04 
> tech_datestamp=2016-04-05 
> tech_datestamp=2016-04-06 
> Time taken: 0.417 seconds, Fetched: 29 row(s) 
> {code}
> Ideally msck repair should ignore subdirectory if that additional partition 
> column doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14098) Logging task properties, and environment variables might contain passwords

2016-09-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503747#comment-15503747
 ] 

Hive QA commented on HIVE-14098:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829192/HIVE-14098.2-branch-2.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 253 failed/errored test(s), 10352 tests 
executed
*Failed tests:*
{noformat}
249_TestHWISessionManager - did not produce a TEST-*.xml file
382_TestMsgBusConnection - did not produce a TEST-*.xml file
771_TestHiveDruidQueryBasedInputFormat - did not produce a TEST-*.xml file
772_TestDruidSerDe - did not produce a TEST-*.xml file
782_TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file
783_TestJdbcWithMiniKdc - did not produce a TEST-*.xml file
784_TestHs2HooksWithMiniKdc - did not produce a TEST-*.xml file
786_TestJdbcWithDBTokenStore - did not produce a TEST-*.xml file
787_TestJdbcWithMiniKdcCookie - did not produce a TEST-*.xml file
788_TestJdbcNonKrbSASLWithMiniKdc - did not produce a TEST-*.xml file
790_TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fouter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_map_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_json_serde1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13

[jira] [Updated] (HIVE-14186) Display the UDF exception message in MapReduce in beeline console

2016-09-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14186:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Yongzhi for reviewing.

> Display the UDF exception message in MapReduce in beeline console 
> --
>
> Key: HIVE-14186
> URL: https://issues.apache.org/jira/browse/HIVE-14186
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-14186.1.patch
>
>
> Currently when Mapper or Reducer fails, the beeline console will print the 
> following error.
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2) 
> It would be helpful if we can print the exceptions from the mapreduce to the 
> beeline console directly so you don't need to dig into the MR log to find it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-2496) Allow ALTER TABLE RENAME between schemas

2016-09-19 Thread Ian Cook (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503603#comment-15503603
 ] 

Ian Cook commented on HIVE-2496:


I believe this issue was resolved in Hive 0.14.0, so this issue should be 
closed.

> Allow ALTER TABLE RENAME between schemas
> 
>
> Key: HIVE-2496
> URL: https://issues.apache.org/jira/browse/HIVE-2496
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Patrick Angeles
> Attachments: HIVE-2496.1.patch, HIVE-2496.2.patch
>
>
> Currently, this is not allowed which is unfortunate:
> ALTER TABLE db1.foo RENAME TO db2.foo ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"

2016-09-19 Thread Gabor Szadovszky (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503484#comment-15503484
 ] 

Gabor Szadovszky commented on HIVE-14714:
-

Thanks a lot for the hint, [~lirui]. The fix of [HIVE-13895] should solve the 
waiting problem. 

However, in case of child.waitFor() is interrupted and the related process 
still generates some output the IOException in the redirector threads would be 
logged. (It might occur if the related spark configs are modified.) I think, 
these exceptions might be misleading. 
So, I would do a minimal modification to swallow these IOExceptions in case we 
are about to stop the remote driver (isAlive is false). What do you think?

> Finishing Hive on Spark causes "java.io.IOException: Stream closed"
> ---
>
> Key: HIVE-14714
> URL: https://issues.apache.org/jira/browse/HIVE-14714
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
> Attachments: HIVE-14714.2.patch, HIVE-14714.patch
>
>
> After execute hive command with Spark, finishing the beeline session or
> even switch the engine causes IOException. The following executed Ctrl-D to
> finish the session but "!quit" or even "set hive.execution.engine=mr;" causes
> the issue.
> From HS2 log:
> {code}
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote 
> driver, interrupting...
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [Driver]: Waiting thread interrupted, killing child process.
> 2016-09-06 16:15:12,296 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [stderr-redir-1]: Error in redirector thread.
> java.io.IOException: Stream closed
> at 
> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:154)
> at java.io.BufferedReader.readLine(BufferedReader.java:317)
> at java.io.BufferedReader.readLine(BufferedReader.java:382)
> at 
> org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14753) Track the number of open/closed/abandoned sessions in HS2

2016-09-19 Thread Barna Zsombor Klara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503496#comment-15503496
 ] 

Barna Zsombor Klara commented on HIVE-14753:


Posted on rb: https://reviews.apache.org/r/52029/

> Track the number of open/closed/abandoned sessions in HS2
> -
>
> Key: HIVE-14753
> URL: https://issues.apache.org/jira/browse/HIVE-14753
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, HiveServer2
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> We should be able to track the nr. of sessions since the startup of the HS2 
> instance as well as the average lifetime of a session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14098) Logging task properties, and environment variables might contain passwords

2016-09-19 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14098:
--
Attachment: HIVE-14098.2-branch-2.1.patch

Just for QA test, no real change

> Logging task properties, and environment variables might contain passwords
> --
>
> Key: HIVE-14098
> URL: https://issues.apache.org/jira/browse/HIVE-14098
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Logging, Spark
>Affects Versions: 2.1.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Fix For: 2.2.0
>
> Attachments: HIVE-14098-branch-2.1.patch, 
> HIVE-14098.2-branch-2.1.patch, HIVE-14098.2.patch, HIVE-14098.patch
>
>
> Hive MapredLocalTask Can Print Environment Passwords, like 
> -Djavax.net.ssl.trustStorePassword.
> The same could happen, when logging spark properties



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14487) Add REBUILD statement for materialized views

2016-09-19 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14487:
---
Assignee: (was: Alan Gates)

> Add REBUILD statement for materialized views
> 
>
> Key: HIVE-14487
> URL: https://issues.apache.org/jira/browse/HIVE-14487
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> Support for rebuilding existing materialized views. The statement is the 
> following:
> {code:sql}
> ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14484) Extensions for initial materialized views implementation

2016-09-19 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14484:
---
Issue Type: Improvement  (was: Bug)

> Extensions for initial materialized views implementation
> 
>
> Key: HIVE-14484
> URL: https://issues.apache.org/jira/browse/HIVE-14484
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Follow-up of HIVE-14249.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"

2016-09-19 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503061#comment-15503061
 ] 

Rui Li commented on HIVE-14714:
---

bq. These threads are running in HS2 therefore, they won't be terminated in 
case of beeline is closed.
Yeah, but if we use CLI, these threads run in the CLI. Then we may lose some 
output from spark-submit after CLI exits.
Thinking more about this, I think the problem is more specific to yarn-cluster 
mode right? Because in yarn-client mode, RemoteDriver runs in spark-submit so 
it should shut down properly.
For yarn-cluster mode, spark-submit is just a monitor for the spark app. It may 
be acceptable to lose some output from it. But on the other hand, user can set 
{{spark.yarn.submit.waitAppCompletion=false}} so that spark-submit exits after 
the app is submitted in order to avoid this hanging issue. HIVE-13895 actually 
made this default.
I wonder if that should be enough for the issue.

> Finishing Hive on Spark causes "java.io.IOException: Stream closed"
> ---
>
> Key: HIVE-14714
> URL: https://issues.apache.org/jira/browse/HIVE-14714
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
> Attachments: HIVE-14714.2.patch, HIVE-14714.patch
>
>
> After execute hive command with Spark, finishing the beeline session or
> even switch the engine causes IOException. The following executed Ctrl-D to
> finish the session but "!quit" or even "set hive.execution.engine=mr;" causes
> the issue.
> From HS2 log:
> {code}
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote 
> driver, interrupting...
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [Driver]: Waiting thread interrupted, killing child process.
> 2016-09-06 16:15:12,296 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [stderr-redir-1]: Error in redirector thread.
> java.io.IOException: Stream closed
> at 
> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:154)
> at java.io.BufferedReader.readLine(BufferedReader.java:317)
> at java.io.BufferedReader.readLine(BufferedReader.java:382)
> at 
> org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"

2016-09-19 Thread Gabor Szadovszky (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502706#comment-15502706
 ] 

Gabor Szadovszky commented on HIVE-14714:
-

Hi [~lirui],

# The root cause of the spark submit hang is that the refresh interval of the 
checking of the process might set as large as it won't get the new state of the 
remote driver in time. This value can be modified by the user therefore, I 
would like to handle this situation.
# These threads are running in HS2 therefore, they won't be terminated in case 
of beeline is closed. The only effect on the beeline is that it don't have to 
wait for the timeout as the method stop() will return immediately. (In case of 
HS2 is running in embedded mode, then these threads will be terminated but it 
was the original behaviour which I haven't changed.)

> Finishing Hive on Spark causes "java.io.IOException: Stream closed"
> ---
>
> Key: HIVE-14714
> URL: https://issues.apache.org/jira/browse/HIVE-14714
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
> Attachments: HIVE-14714.2.patch, HIVE-14714.patch
>
>
> After execute hive command with Spark, finishing the beeline session or
> even switch the engine causes IOException. The following executed Ctrl-D to
> finish the session but "!quit" or even "set hive.execution.engine=mr;" causes
> the issue.
> From HS2 log:
> {code}
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote 
> driver, interrupting...
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [Driver]: Waiting thread interrupted, killing child process.
> 2016-09-06 16:15:12,296 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [stderr-redir-1]: Error in redirector thread.
> java.io.IOException: Stream closed
> at 
> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:154)
> at java.io.BufferedReader.readLine(BufferedReader.java:317)
> at java.io.BufferedReader.readLine(BufferedReader.java:382)
> at 
> org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-14777) Add support of Spark-2.0.0 in Hive-2.X.X

2016-09-19 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li resolved HIVE-14777.
---
   Resolution: Duplicate
Fix Version/s: (was: 2.2.0)

> Add support of Spark-2.0.0 in Hive-2.X.X
> 
>
> Key: HIVE-14777
> URL: https://issues.apache.org/jira/browse/HIVE-14777
> Project: Hive
>  Issue Type: Wish
>Reporter: Oleksiy Sayankin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14777) Add support of Spark-2.0.0 in Hive-2.X.X

2016-09-19 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502667#comment-15502667
 ] 

Rui Li commented on HIVE-14777:
---

Closing this one as a dup of HIVE-14029.

> Add support of Spark-2.0.0 in Hive-2.X.X
> 
>
> Key: HIVE-14777
> URL: https://issues.apache.org/jira/browse/HIVE-14777
> Project: Hive
>  Issue Type: Wish
>Reporter: Oleksiy Sayankin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14785) return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

2016-09-19 Thread vinitkumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502557#comment-15502557
 ] 

vinitkumar commented on HIVE-14785:
---

This is the error its showing.
"Log Type: stderr
Log UpLoadTime: 19-Sep-2016 07:01:21
Log Length: 88
Error: Could not find or load main class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Log Type: stdout
Log UpLoadTime: 19-Sep-2016 07:01:21
Log Length: 0"

> return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> ---
>
> Key: HIVE-14785
> URL: https://issues.apache.org/jira/browse/HIVE-14785
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.1
> Environment: Hortonworks, Talend , Hive
>Reporter: vinitkumar
>
> Hi,
> I am creating partitioned ORS table in Hive using Talend. But after executing 
> job i am getting error :-
> Error while processing statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> Can you please suggest what could be the issue ?
> Thanks,
> Vinitkumar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14785) return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

2016-09-19 Thread vinitkumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502556#comment-15502556
 ] 

vinitkumar commented on HIVE-14785:
---

This is the error its showing.
"Log Type: stderr
Log UpLoadTime: 19-Sep-2016 07:01:21
Log Length: 88
Error: Could not find or load main class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Log Type: stdout
Log UpLoadTime: 19-Sep-2016 07:01:21
Log Length: 0"

> return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> ---
>
> Key: HIVE-14785
> URL: https://issues.apache.org/jira/browse/HIVE-14785
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.1
> Environment: Hortonworks, Talend , Hive
>Reporter: vinitkumar
>
> Hi,
> I am creating partitioned ORS table in Hive using Talend. But after executing 
> job i am getting error :-
> Error while processing statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> Can you please suggest what could be the issue ?
> Thanks,
> Vinitkumar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14785) return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

2016-09-19 Thread vinitkumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502553#comment-15502553
 ] 

vinitkumar commented on HIVE-14785:
---

I opened log file. Its giving error like 
"Log Type: stderr
Log UpLoadTime: 19-Sep-2016 07:01:21
Log Length: 88
Error: Could not find or load main class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Log Type: stdout
Log UpLoadTime: 19-Sep-2016 07:01:21
Log Length: 0 "

> return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> ---
>
> Key: HIVE-14785
> URL: https://issues.apache.org/jira/browse/HIVE-14785
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.1
> Environment: Hortonworks, Talend , Hive
>Reporter: vinitkumar
>
> Hi,
> I am creating partitioned ORS table in Hive using Talend. But after executing 
> job i am getting error :-
> Error while processing statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> Can you please suggest what could be the issue ?
> Thanks,
> Vinitkumar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

88 matches

Mail list logo