[jira] [Updated] (HIVE-16672) Parquet vectorization doesn't work for tables with partition info

2017-05-31 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-16672:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

PTest passed locally. Committed to branch 2.3.

> Parquet vectorization doesn't work for tables with partition info
> -
>
> Key: HIVE-16672
> URL: https://issues.apache.org/jira/browse/HIVE-16672
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16672.001.patch, HIVE-16672.002.patch, 
> HIVE-16672-branch2.3.patch
>
>
> VectorizedParquetRecordReader doesn't check and update partition cols, this 
> should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-05-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032509#comment-16032509
 ] 

Hive QA commented on HIVE-16730:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870728/HIVE-16730.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 10811 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_text] (batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_mapjoin] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf_character_length]
 (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf_octet_length] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_date_funcs] 
(batchId=72)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_2]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_4]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_mapjoin]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_windowing_navfn]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_date_funcs]
 (batchId=158)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_mapjoin]
 (batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=134)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5494/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5494/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5494/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870728 - PreCommit-HIVE-Build

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2017-05-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032497#comment-16032497
 ] 

Josh Rosen edited comment on HIVE-16391 at 6/1/17 5:59 AM:
---

I tried to see whether Spark can consume existing Hive 1.2.1 artifacts, but it 
looks like neither the regular nor {{core}} hive-exec artifacts can work:

* We can't use the regular Hive uber-JAR artifacts because they include many 
transitive dependencies but do not relocate those dependencies' classes into a 
private namespace, so this will cause multiple versions of the same class to be 
included on the classpath. To see this, note the long list of artifacts at 
https://github.com/apache/hive/blob/release-1.2.1/ql/pom.xml#L685 but there is 
only one relocation pattern (for Kryo).
* We can't use the {{core}}-classified artifact:
** We actually need Kryo to be shaded in {{hive-exec}} because Spark now uses 
Kryo 3 (which is needed by Chill 0.8.x, which is needed for Scala 2.12) while 
Hive uses Kryo 2.
** In addition, I think that Spark needs to shade Hive's 
{{com.google.protobuf:protobuf-java}} dependency.
** The published {{hive-exec}} POM is a "dependency-reduced" POM which doesn't 
declare {{hive-exec}}'s transitive dependencies. To see this, compare the 
declared dependencies in the published POM in Maven Central 
(http://central.maven.org/maven2/org/apache/hive/hive-exec/1.2.1/hive-exec-1.2.1.pom)
 to the dependencies the source repo's POM:  
https://github.com/apache/hive/blob/release-1.2.1/ql/pom.xml. The lack of 
declared dependencies creates an additional layer of pain for us when consuming 
the {{core}} JAR because we now have to shoulder the burden of declaring 
explicit dependencies on {{hive-exec}}'s transitive dependencies (since they're 
no longer bundled in an uber JAR when we use the {{core}} JAR), making it 
harder to use tools like Maven's {{dependency:tree}} to help us spot potential 
dep. conflicts.

Spark's current custom Hive fork is effectively making three changes compared 
to Hive 1.2.1 order to work around the above problems plus some legacy issues 
which are no longer relevant:

* Remove the shading/bundling of most non-Hive classes, with the exception of 
Kryo and Protobuf. This has the effect of making the published POM declare 
proper transitive dependencies, easing the dep. management story in Spark's 
POMs, while still ensuring that we relocate classes that conflict with Spark.
* Package the hive-shims into the hive-exec JAR. I don't think that this is 
strictly necessary.
* Downgrade Kryo to 2.21. This isn't necessary anymore: there was an earlier 
time where we purposely _unshaded_ Kryo and pinned Hive's version to match 
Spark's. The only reason that this change is present today was to minimize the 
diff between versions 1 and 2 of Spark's Hive fork.

For the full details, see 
https://github.com/apache/hive/compare/release-1.2.1...JoshRosen:release-1.2.1-spark2,
 which compares the current Version 2 of our Hive fork to stock Hive 1.2.1.

Maven classifiers do not allow the declaration of different dependencies for 
artifacts depending on their classifiers, so if we wanted to publish a 
{{hive-exec core}}-like artifact which declares its transitive dependencies 
then this would need to be done under a new Maven artifact name or new version 
(e.g. Hive 1.2.2-spark).

That said, proper declaration of transitive dependencies isn't a hard blocker 
for us: a long, long, long time ago, I think that Spark may have actually built 
with a stock {{core}} artifact and explicitly declared the transitive deps, so 
if we've handled that dependency declaration before then we can do it again at 
the cost of some pain in the future if we want to bump to Hive 2.x.

Therefore, I think the minimal change needed in Hive's build is to add a new 
classifier, say {{core-spark}}, which behaves like {{core}} except that it 
shades and relocates Kryo and Protobuf. If this artifact existed then I think 
Spark could use that classified artifact, declare an explicit dependency on the 
shim artifacts (assuming Kryo and Protobuf don't need to be shaded there) and 
explicitly pull in all of {{hive-exec}}'s transitive dependencies. This avoids 
the need to publish separate _versions_ for Spark: instead, Spark would just 
consume a differently-packaged/differently-classified version of a stock Hive 
release.

If we go with this latter approach, then I guess Hive would need to publish 
1.2.3 or 1.2.2.1 in order to introduce the new classified artifact.

Does this sound like a reasonable approach? Or would it make more sense to have 
a separate Hive branch and versioning scheme for Spark (e.g. 
{{branch-1.2-spark}} and Hive {{1.2.1-spark}})? I lean towards the former 
approach (releasing 1.2.3 with an additional Spark-specific classifier), 
especially if we want to fix bugs or make functional / non-packaging changes

[jira] [Updated] (HIVE-16799) Control the max number of task for a stage in a spark job

2017-05-31 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16799:
---
Status: Patch Available  (was: Open)

> Control the max number of task for a stage in a spark job
> -
>
> Key: HIVE-16799
> URL: https://issues.apache.org/jira/browse/HIVE-16799
> Project: Hive
>  Issue Type: Improvement
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16799.patch
>
>
> HIVE-16552 gives admin an option to control the maximum number of tasks a 
> Spark job may have. However, this may not be sufficient as this tends to 
> penalize jobs that have many stages while favoring jobs that has fewer 
> stages. Ideally, we should also limit the number of tasks in a stage, which 
> is closer to the maximum number of mappers or reducers in a MR job.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16799) Control the max number of task for a stage in a spark job

2017-05-31 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16799:
---
Attachment: HIVE-16799.patch

> Control the max number of task for a stage in a spark job
> -
>
> Key: HIVE-16799
> URL: https://issues.apache.org/jira/browse/HIVE-16799
> Project: Hive
>  Issue Type: Improvement
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16799.patch
>
>
> HIVE-16552 gives admin an option to control the maximum number of tasks a 
> Spark job may have. However, this may not be sufficient as this tends to 
> penalize jobs that have many stages while favoring jobs that has fewer 
> stages. Ideally, we should also limit the number of tasks in a stage, which 
> is closer to the maximum number of mappers or reducers in a MR job.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2017-05-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032497#comment-16032497
 ] 

Josh Rosen commented on HIVE-16391:
---

I tried to see whether Spark can consume existing Hive 1.2.1 artifacts, but it 
looks like neither the regular nor {{core}} hive-exec artifacts can work:

* We can't use the regular Hive uber-JAR artifacts because they include many 
transitive dependencies but do not relocate those dependencies' classes into a 
private namespace, so this will cause multiple versions of the same class to be 
included on the classpath. To see this, note the long list of artifacts at 
https://github.com/apache/hive/blob/release-1.2.1/ql/pom.xml#L685 but there is 
only one relocation pattern (for Kryo).
* We can't use the {{core}}-classified artifact:
** We actually need Kryo to be shaded in {{hive-exec}} because Spark now uses 
Kryo 3 (which is needed by Chill 0.8.x, which is needed for Scala 2.12) while 
Hive uses Kryo 2.
** In addition, I think that Spark needs to shade Hive's 
{{com.google.protobuf:protobuf-java}} dependency.
** The published {{hive-exec}} POM is a "dependency-reduced" POM which doesn't 
declare {{hive-exec}}'s transitive dependencies. To see this, compare the 
declared dependencies in the published POM in Maven Central 
(http://central.maven.org/maven2/org/apache/hive/hive-exec/1.2.1/hive-exec-1.2.1.pom)
 to the dependencies the source repo's POM:  
https://github.com/apache/hive/blob/release-1.2.1/ql/pom.xml. The lack of 
declared dependencies creates an additional layer of pain for us when consuming 
the {{core}} JAR because we now have to shoulder the burden of declaring 
explicit dependencies on {{hive-exec}}'s transitive dependencies (since they're 
no longer bundled in an uber JAR when we use the {{core}} JAR), making it 
harder to use tools like Maven's {{dependency:tree}} to help us spot potential 
dep. conflicts.

Spark's current custom Hive fork is effectively making three changes compared 
to Hive 1.2.1 order to work around the above problems plus some legacy issues 
which are no longer relevant:

* Remove the shading/bundling of most non-Hive classes, with the exception of 
Kryo and Protobuf. This has the effect of making the published POM 
non-dependency-reduced, easing the dep. management story in Spark's POMs, while 
still ensuring that we relocate classes that conflict with Spark.
* Package the hive-shims into the hive-exec JAR. I don't think that this is 
strictly necessary.
* Downgrade Kryo to 2.21. This isn't necessary anymore: there was an earlier 
time where we purposely _unshaded_ Kryo and pinned Hive's version to match 
Spark's. The only reason that this change is present today was to minimize the 
diff between versions 1 and 2 of Spark's Hive fork.

For the full details, see 
https://github.com/apache/hive/compare/release-1.2.1...JoshRosen:release-1.2.1-spark2,
 which compares the current Version 2 of our Hive fork to stock Hive 1.2.1.

Maven classifiers do not allow the declaration of different dependencies for 
artifacts depending on their classifiers, so if we wanted to publish a 
{{hive-exec core}}-like artifact which declares its transitive dependencies 
then this would need to be done under a new Maven artifact name or new version 
(e.g. Hive 1.2.2-spark).

That said, proper declaration of transitive dependencies isn't a hard blocker 
for us: a long, long, long time ago, I think that Spark may have actually built 
with a stock {{core}} artifact and explicitly declared the transitive deps, so 
if we've handled that dependency declaration before then we can do it again at 
the cost of some pain in the future if we want to bump to Hive 2.x.

Therefore, I think the minimal change needed in Hive's build is to add a new 
classifier, say {{core-spark}}, which behaves like {{core}} except that it 
shades and relocates Kryo and Protobuf. If this artifact existed then I think 
Spark could use that classified artifact, declare an explicit dependency on the 
shim artifacts (assuming Kryo and Protobuf don't need to be shaded there) and 
explicitly pull in all of {{hive-exec}}'s transitive dependencies. This avoids 
the need to publish separate _versions_ for Spark: instead, Spark would just 
consume a differently-packaged/differently-classified version of a stock Hive 
release.

If we go with this latter approach, then I guess Hive would need to publish 
1.2.3 or 1.2.2.1 in order to introduce the new classified artifact.

Does this sound like a reasonable approach? Or would it make more sense to have 
a separate Hive branch and versioning scheme for Spark (e.g. 
{{branch-1.2-spark}} and Hive {{1.2.1-spark}})? I lean towards the former 
approach (releasing 1.2.3 with an additional Spark-specific classifier), 
especially if we want to fix bugs or make functional / non-packaging changes 
later down the road (I think [~ste...@apache.org] had a few c

[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-05-31 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-11297:

Status: Patch Available  (was: Open)

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HIVE-16052) MM tables: add exchange partition test after ACID integration

2017-05-31 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng reopened HIVE-16052:
--
  Assignee: Wei Zheng

Sure, will add the test for MM

> MM tables: add exchange partition test after ACID integration
> -
>
> Key: HIVE-16052
> URL: https://issues.apache.org/jira/browse/HIVE-16052
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Fix For: hive-14535
>
>
> exchgpartition2lel test fails if all tables are changed to MM, because of 
> write ID mismatch between directories and tables when exchanging partition 
> directories between tables. ACID should probably fix this because transaction 
> IDs are global.
> We should add a test after integrating with ACID; if it doesn't work for some 
> other reason, we can either implement it as moving to a new mm_id/txn_id in 
> each affected partition, or block it on MM tables.
> cc [~wzheng]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-05-31 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16730:
--
Status: Patch Available  (was: Open)

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-05-31 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032440#comment-16032440
 ] 

Teddy Choi commented on HIVE-16730:
---

This patch makes HIVE-16589 able to evolve schema in STRUCT. However, column c5 
of the last record in schema_evol_text_vec_part_all_complex.q.out becomes null, 
not 1255178165 after applying HIVE-16730 on HIVE-16589. I think that it's 
related with LazySimpleDeserializeRead or LazySimpleSerializeWrite.

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-05-31 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032440#comment-16032440
 ] 

Teddy Choi edited comment on HIVE-16730 at 6/1/17 4:50 AM:
---

This patch makes HIVE-16589 able to evolve schema in complex types. However, 
column c5 of the last record in schema_evol_text_vec_part_all_complex.q.out 
becomes null, not 1255178165 after applying HIVE-16730 on HIVE-16589. I think 
that it's related with LazySimpleDeserializeRead or LazySimpleSerializeWrite.


was (Author: teddy.choi):
This patch makes HIVE-16589 able to evolve schema in STRUCT. However, column c5 
of the last record in schema_evol_text_vec_part_all_complex.q.out becomes null, 
not 1255178165 after applying HIVE-16730 on HIVE-16589. I think that it's 
related with LazySimpleDeserializeRead or LazySimpleSerializeWrite.

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-05-31 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16730:
--
Attachment: HIVE-16730.1.patch

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize

2017-05-31 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16731:
--
Attachment: HIVE-16731.1.patch

> Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null 
> end" that involves a column name or expression THEN or ELSE vectorize
> ---
>
> Key: HIVE-16731
> URL: https://issues.apache.org/jira/browse/HIVE-16731
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
>
> Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize

2017-05-31 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16731:
--
Attachment: (was: HIVE-16731.1.patch)

> Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null 
> end" that involves a column name or expression THEN or ELSE vectorize
> ---
>
> Key: HIVE-16731
> URL: https://issues.apache.org/jira/browse/HIVE-16731
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
>
> Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16672) Parquet vectorization doesn't work for tables with partition info

2017-05-31 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-16672:

Attachment: HIVE-16672-branch2.3.patch

> Parquet vectorization doesn't work for tables with partition info
> -
>
> Key: HIVE-16672
> URL: https://issues.apache.org/jira/browse/HIVE-16672
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16672.001.patch, HIVE-16672.002.patch, 
> HIVE-16672-branch2.3.patch
>
>
> VectorizedParquetRecordReader doesn't check and update partition cols, this 
> should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16672) Parquet vectorization doesn't work for tables with partition info

2017-05-31 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-16672:

   Fix Version/s: 2.3.0
Target Version/s: 2.3.0, 3.0.0  (was: 3.0.0)
  Status: Patch Available  (was: Reopened)

> Parquet vectorization doesn't work for tables with partition info
> -
>
> Key: HIVE-16672
> URL: https://issues.apache.org/jira/browse/HIVE-16672
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16672.001.patch, HIVE-16672.002.patch, 
> HIVE-16672-branch2.3.patch
>
>
> VectorizedParquetRecordReader doesn't check and update partition cols, this 
> should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HIVE-16672) Parquet vectorization doesn't work for tables with partition info

2017-05-31 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma reopened HIVE-16672:
-

Reopen the ticket to upload patch for branch 2.3.

> Parquet vectorization doesn't work for tables with partition info
> -
>
> Key: HIVE-16672
> URL: https://issues.apache.org/jira/browse/HIVE-16672
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16672.001.patch, HIVE-16672.002.patch
>
>
> VectorizedParquetRecordReader doesn't check and update partition cols, this 
> should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16799) Control the max number of task for a stage in a spark job

2017-05-31 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-16799:
--


> Control the max number of task for a stage in a spark job
> -
>
> Key: HIVE-16799
> URL: https://issues.apache.org/jira/browse/HIVE-16799
> Project: Hive
>  Issue Type: Improvement
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> HIVE-16552 gives admin an option to control the maximum number of tasks a 
> Spark job may have. However, this may not be sufficient as this tends to 
> penalize jobs that have many stages while favoring jobs that has fewer 
> stages. Ideally, we should also limit the number of tasks in a stage, which 
> is closer to the maximum number of mappers or reducers in a MR job.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-05-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032301#comment-16032301
 ] 

Sergey Shelukhin commented on HIVE-12631:
-

Will take a look this week

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, HIVE-12631.8.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16772) Support TPCDS query11.q in PerfCliDriver

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16772:
--

Assignee: Pengcheng Xiong

> Support TPCDS query11.q in PerfCliDriver
> 
>
> Key: HIVE-16772
> URL: https://issues.apache.org/jira/browse/HIVE-16772
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 54:22 Invalid column 
> reference 'customer_preferred_cust_flag'
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11744)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11692)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16654) Optimize a combination of avg(), sum(), count(distinct) etc

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16654:
---
Affects Version/s: 2.0.0

> Optimize a combination of avg(), sum(), count(distinct) etc
> ---
>
> Key: HIVE-16654
> URL: https://issues.apache.org/jira/browse/HIVE-16654
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16654.01.patch, HIVE-16654.02.patch, 
> HIVE-16654.03.patch, HIVE-16654.04.patch
>
>
> an example rewrite for q28 of tpcds is 
> {code}
> (select LP as B1_LP ,CNT  as B1_CNT,CNTD as B1_CNTD
>   from (select sum(xc0) / sum(xc1) as LP, sum(xc1) as CNT, count(1) as 
> CNTD from (select sum(ss_list_price) as xc0, count(ss_list_price) as xc1 from 
> store_sales  where 
> ss_list_price is not null and ss_quantity between 0 and 5
> and (ss_list_price between 11 and 11+10 
>  or ss_coupon_amt between 460 and 460+1000
>  or ss_wholesale_cost between 14 and 14+20)
>  group by ss_list_price) ss0) ss1) B1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16654) Optimize a combination of avg(), sum(), count(distinct) etc

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16654:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Optimize a combination of avg(), sum(), count(distinct) etc
> ---
>
> Key: HIVE-16654
> URL: https://issues.apache.org/jira/browse/HIVE-16654
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16654.01.patch, HIVE-16654.02.patch, 
> HIVE-16654.03.patch, HIVE-16654.04.patch
>
>
> an example rewrite for q28 of tpcds is 
> {code}
> (select LP as B1_LP ,CNT  as B1_CNT,CNTD as B1_CNTD
>   from (select sum(xc0) / sum(xc1) as LP, sum(xc1) as CNT, count(1) as 
> CNTD from (select sum(ss_list_price) as xc0, count(ss_list_price) as xc1 from 
> store_sales  where 
> ss_list_price is not null and ss_quantity between 0 and 5
> and (ss_list_price between 11 and 11+10 
>  or ss_coupon_amt between 460 and 460+1000
>  or ss_wholesale_cost between 14 and 14+20)
>  group by ss_list_price) ss0) ss1) B1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16654) Optimize a combination of avg(), sum(), count(distinct) etc

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16654:
---
Fix Version/s: 3.0.0

> Optimize a combination of avg(), sum(), count(distinct) etc
> ---
>
> Key: HIVE-16654
> URL: https://issues.apache.org/jira/browse/HIVE-16654
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16654.01.patch, HIVE-16654.02.patch, 
> HIVE-16654.03.patch, HIVE-16654.04.patch
>
>
> an example rewrite for q28 of tpcds is 
> {code}
> (select LP as B1_LP ,CNT  as B1_CNT,CNTD as B1_CNTD
>   from (select sum(xc0) / sum(xc1) as LP, sum(xc1) as CNT, count(1) as 
> CNTD from (select sum(ss_list_price) as xc0, count(ss_list_price) as xc1 from 
> store_sales  where 
> ss_list_price is not null and ss_quantity between 0 and 5
> and (ss_list_price between 11 and 11+10 
>  or ss_coupon_amt between 460 and 460+1000
>  or ss_wholesale_cost between 14 and 14+20)
>  group by ss_list_price) ss0) ss1) B1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

2017-05-31 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032275#comment-16032275
 ] 

Vihang Karajgaonkar commented on HIVE-16771:


Thanks for the review [~spena] and [~ngangam]

> Schematool should use MetastoreSchemaInfo to get the metastore schema version 
> from database
> ---
>
> Key: HIVE-16771
> URL: https://issues.apache.org/jira/browse/HIVE-16771
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, 
> HIVE-16771.03.patch, HIVE-16771.04.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo 
> implementation to manage schema upgrades and initialization if needed. In 
> order to make HiveSchemaTool completely agnostic it should depend on 
> IMetastoreSchemaInfo implementation which is configured to get the metastore 
> schema version information from the database. It should also not assume the 
> scripts directory and hardcode it itself. It would rather ask 
> MetastoreSchemaInfo class to get the metastore scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16774:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

thanks [~ashutoshc] for the review

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16774.01.patch, HIVE-16774.02.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16774:
---
Affects Version/s: 2.0.0

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16774.01.patch, HIVE-16774.02.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16774:
---
Fix Version/s: 3.0.0

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16774.01.patch, HIVE-16774.02.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16798) Flaky test query14.q,query23.q

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16798:
---
Attachment: HIVE-16798.01.patch

> Flaky test query14.q,query23.q
> --
>
> Key: HIVE-16798
> URL: https://issues.apache.org/jira/browse/HIVE-16798
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16798.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16798) Flaky test query14.q,query23.q

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16798:
---
Status: Patch Available  (was: Open)

> Flaky test query14.q,query23.q
> --
>
> Key: HIVE-16798
> URL: https://issues.apache.org/jira/browse/HIVE-16798
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16798.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-05-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032269#comment-16032269
 ] 

Hive QA commented on HIVE-12631:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870691/HIVE-12631.8.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10808 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3]
 (batchId=158)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5491/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5491/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5491/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870691 - PreCommit-HIVE-Build

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, HIVE-12631.8.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16798) Flaky test query14.q,query23.q

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16798:
---
Summary: Flaky test query14.q,query23.q  (was: Flaky test query14.q)

> Flaky test query14.q,query23.q
> --
>
> Key: HIVE-16798
> URL: https://issues.apache.org/jira/browse/HIVE-16798
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16798) Flaky test query14.q

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16798:
--


> Flaky test query14.q
> 
>
> Key: HIVE-16798
> URL: https://issues.apache.org/jira/browse/HIVE-16798
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16772) Support TPCDS query11.q in PerfCliDriver

2017-05-31 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032241#comment-16032241
 ] 

Pengcheng Xiong commented on HIVE-16772:


use c_preferred_cust_flag

> Support TPCDS query11.q in PerfCliDriver
> 
>
> Key: HIVE-16772
> URL: https://issues.apache.org/jira/browse/HIVE-16772
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 54:22 Invalid column 
> reference 'customer_preferred_cust_flag'
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11744)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11692)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16052) MM tables: add exchange partition test after ACID integration

2017-05-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032238#comment-16032238
 ] 

Sergey Shelukhin commented on HIVE-16052:
-

Yeah might be a good idea, so that it is actually a repeatable test

> MM tables: add exchange partition test after ACID integration
> -
>
> Key: HIVE-16052
> URL: https://issues.apache.org/jira/browse/HIVE-16052
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
> Fix For: hive-14535
>
>
> exchgpartition2lel test fails if all tables are changed to MM, because of 
> write ID mismatch between directories and tables when exchanging partition 
> directories between tables. ACID should probably fix this because transaction 
> IDs are global.
> We should add a test after integrating with ACID; if it doesn't work for some 
> other reason, we can either implement it as moving to a new mm_id/txn_id in 
> each affected partition, or block it on MM tables.
> cc [~wzheng]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16052) MM tables: add exchange partition test after ACID integration

2017-05-31 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032214#comment-16032214
 ] 

Wei Zheng commented on HIVE-16052:
--

exchgpartition2lel.q. When I convert the tables to MM, it produced correct 
result. Do we want a counterpart of this test for MM?

> MM tables: add exchange partition test after ACID integration
> -
>
> Key: HIVE-16052
> URL: https://issues.apache.org/jira/browse/HIVE-16052
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
> Fix For: hive-14535
>
>
> exchgpartition2lel test fails if all tables are changed to MM, because of 
> write ID mismatch between directories and tables when exchanging partition 
> directories between tables. ACID should probably fix this because transaction 
> IDs are global.
> We should add a test after integrating with ACID; if it doesn't work for some 
> other reason, we can either implement it as moving to a new mm_id/txn_id in 
> each affected partition, or block it on MM tables.
> cc [~wzheng]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16775) Fix HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032213#comment-16032213
 ] 

Hive QA commented on HIVE-16775:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870669/HIVE-16775.02.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10811 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5490/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5490/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5490/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870669 - PreCommit-HIVE-Build

> Fix HiveFilterAggregateTransposeRule when filter is always false
> 
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch, HIVE-16775.02.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16052) MM tables: add exchange partition test after ACID integration

2017-05-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032211#comment-16032211
 ] 

Sergey Shelukhin commented on HIVE-16052:
-

Which test? Is it possible to add it explicitly to MM tests?

> MM tables: add exchange partition test after ACID integration
> -
>
> Key: HIVE-16052
> URL: https://issues.apache.org/jira/browse/HIVE-16052
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
> Fix For: hive-14535
>
>
> exchgpartition2lel test fails if all tables are changed to MM, because of 
> write ID mismatch between directories and tables when exchanging partition 
> directories between tables. ACID should probably fix this because transaction 
> IDs are global.
> We should add a test after integrating with ACID; if it doesn't work for some 
> other reason, we can either implement it as moving to a new mm_id/txn_id in 
> each affected partition, or block it on MM tables.
> cc [~wzheng]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-05-31 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-12631:
--
Attachment: HIVE-12631.8.patch

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, HIVE-12631.8.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16372) Enable DDL statement for non-native tables (add/remove table properties)

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16372:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Enable DDL statement for non-native tables (add/remove table properties)
> 
>
> Key: HIVE-16372
> URL: https://issues.apache.org/jira/browse/HIVE-16372
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16372.01.patch, HIVE-16372.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16372) Enable DDL statement for non-native tables (add/remove table properties)

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16372:
---
Affects Version/s: 2.0.0

> Enable DDL statement for non-native tables (add/remove table properties)
> 
>
> Key: HIVE-16372
> URL: https://issues.apache.org/jira/browse/HIVE-16372
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16372.01.patch, HIVE-16372.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16372) Enable DDL statement for non-native tables (add/remove table properties)

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16372:
---
Fix Version/s: 3.0.0

> Enable DDL statement for non-native tables (add/remove table properties)
> 
>
> Key: HIVE-16372
> URL: https://issues.apache.org/jira/browse/HIVE-16372
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16372.01.patch, HIVE-16372.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16769) Possible hive service startup due to the existing file /tmp/stderr

2017-05-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032117#comment-16032117
 ] 

Hive QA commented on HIVE-16769:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870648/HIVE-16769.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10808 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5489/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5489/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5489/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870648 - PreCommit-HIVE-Build

> Possible hive service startup due to the existing file /tmp/stderr
> --
>
> Key: HIVE-16769
> URL: https://issues.apache.org/jira/browse/HIVE-16769
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16769.1.patch, HIVE-16769.2.patch
>
>
> HIVE-12497 prints the ignoring errors from hadoop version, hbase mapredcp and 
> hadoop jars to /tmp/$USER/stderr. 
> In some cases $USER is not set, then the file becomes /tmp/stderr.  If  such 
> file preexists with different permission, it will cause the service startup 
> to fail.
> I just tried the script without outputting to stderr file, I don't see such 
> error any more {{"ERROR StatusLogger No log4j2 configuration file found. 
> Using default configuration: logging only errors to the console."}}.
> I think we can remove such redirect to avoid possible startup failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16052) MM tables: add exchange partition test after ACID integration

2017-05-31 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng resolved HIVE-16052.
--
   Resolution: Fixed
Fix Version/s: hive-14535

> MM tables: add exchange partition test after ACID integration
> -
>
> Key: HIVE-16052
> URL: https://issues.apache.org/jira/browse/HIVE-16052
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
> Fix For: hive-14535
>
>
> exchgpartition2lel test fails if all tables are changed to MM, because of 
> write ID mismatch between directories and tables when exchanging partition 
> directories between tables. ACID should probably fix this because transaction 
> IDs are global.
> We should add a test after integrating with ACID; if it doesn't work for some 
> other reason, we can either implement it as moving to a new mm_id/txn_id in 
> each affected partition, or block it on MM tables.
> cc [~wzheng]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16052) MM tables: add exchange partition test after ACID integration

2017-05-31 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032112#comment-16032112
 ] 

Wei Zheng commented on HIVE-16052:
--

I've run this test with all tables created as MM. The test ran fine and the 
result is correct. Closing this one.

> MM tables: add exchange partition test after ACID integration
> -
>
> Key: HIVE-16052
> URL: https://issues.apache.org/jira/browse/HIVE-16052
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
> Fix For: hive-14535
>
>
> exchgpartition2lel test fails if all tables are changed to MM, because of 
> write ID mismatch between directories and tables when exchanging partition 
> directories between tables. ACID should probably fix this because transaction 
> IDs are global.
> We should add a test after integrating with ACID; if it doesn't work for some 
> other reason, we can either implement it as moving to a new mm_id/txn_id in 
> each affected partition, or block it on MM tables.
> cc [~wzheng]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16052) MM tables: add exchange partition test after ACID integration

2017-05-31 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032113#comment-16032113
 ] 

Wei Zheng commented on HIVE-16052:
--

p.s. This is after ACID integration.

> MM tables: add exchange partition test after ACID integration
> -
>
> Key: HIVE-16052
> URL: https://issues.apache.org/jira/browse/HIVE-16052
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
> Fix For: hive-14535
>
>
> exchgpartition2lel test fails if all tables are changed to MM, because of 
> write ID mismatch between directories and tables when exchanging partition 
> directories between tables. ACID should probably fix this because transaction 
> IDs are global.
> We should add a test after integrating with ACID; if it doesn't work for some 
> other reason, we can either implement it as moving to a new mm_id/txn_id in 
> each affected partition, or block it on MM tables.
> cc [~wzheng]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases

2017-05-31 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032067#comment-16032067
 ] 

liyunzhang_intel commented on HIVE-16600:
-

[~lirui]: thanks for review. I think my algorithm in v9 patch will consider 
above case a multi-insert case  as there is more than 1 path to RS to FS
{code}
RS - ... - FS
   - ... - FS
   - ... - Non FS
{code}

{code}
// the multi insert case is like
  // TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
  //  -SEL[6]-LIM[7]-RS[8]-SEL[9]-LIM[10]-FS[11]
  // verify Multi Insert case: if there are more than 1 path from RS(RS[2]) to 
FS in the operator tree, it is a multi-insert
  // case
  private boolean isMultiInsert(ReduceSinkOperator rs) {
int pathToFSNum = 0;
Deque> childQueue = new LinkedList<>();
childQueue.addAll(rs.getChildOperators());
while (!childQueue.isEmpty()) {
  Operator child = childQueue.pop();
  if (child instanceof FileSinkOperator) {
pathToFSNum = pathToFSNum + 1;
  } else {
childQueue.addAll(child.getChildOperators());
  }
}
boolean isMultiInsert = pathToFSNum > 1 ? true : false;
LOG.debug("reducesink:" + rs + " isMultiInsert:" + isMultiInsert);
return isMultiInsert;
  }
{code}
what i am confused is above case is not a multi insert case?  Is it a case 
about spark dynamic partition pruning?
bq.Besides, I'm wondering whether it's better to avoid such order by in 
sub-queries in the first place, as it is essentially pointless.
Agree


> Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel 
> order by in multi_insert cases
> 
>
> Key: HIVE-16600
> URL: https://issues.apache.org/jira/browse/HIVE-16600
> Project: Hive
>  Issue Type: Sub-task
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16600.1.patch, HIVE-16600.2.patch, 
> HIVE-16600.3.patch, HIVE-16600.4.patch, HIVE-16600.5.patch, 
> HIVE-16600.6.patch, HIVE-16600.7.patch, HIVE-16600.8.patch, 
> HIVE-16600.9.patch, mr.explain, mr.explain.log.HIVE-16600
>
>
> multi_insert_gby.case.q
> {code}
> set hive.exec.reducers.bytes.per.reducer=256;
> set hive.optimize.sampling.orderby=true;
> drop table if exists e1;
> drop table if exists e2;
> create table e1 (key string, value string);
> create table e2 (key string);
> FROM (select key, cast(key as double) as keyD, value from src order by key) a
> INSERT OVERWRITE TABLE e1
> SELECT key, value
> INSERT OVERWRITE TABLE e2
> SELECT key;
> select * from e1;
> select * from e2;
> {code} 
> the parallelism of Sort is 1 even we enable parallel order 
> by("hive.optimize.sampling.orderby" is set as "true").  This is not 
> reasonable because the parallelism  should be calcuated by  
> [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170]
> this is because SetSparkReducerParallelism#needSetParallelism returns false 
> when [children size of 
> RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]
>  is greater than 1.
> in this case, the children size of {{RS[2]}} is two.
> the logical plan of the case
> {code}
>TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
> -SEL[6]-FS[7]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16775) Fix HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16775:
---
Status: Open  (was: Patch Available)

> Fix HiveFilterAggregateTransposeRule when filter is always false
> 
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch, HIVE-16775.02.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16775) Fix HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16775:
---
Attachment: HIVE-16775.02.patch

> Fix HiveFilterAggregateTransposeRule when filter is always false
> 
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch, HIVE-16775.02.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16775) Fix HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16775:
---
Status: Patch Available  (was: Open)

> Fix HiveFilterAggregateTransposeRule when filter is always false
> 
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch, HIVE-16775.02.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16797) Support a new rule RemoveUnionBranchRule

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16797:
---
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-16762

> Support a new rule RemoveUnionBranchRule
> 
>
> Key: HIVE-16797
> URL: https://issues.apache.org/jira/browse/HIVE-16797
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> in query4.q, we can see that it creates a CTE with union all of 3 branches. 
> Then it is going to do a 3 way self-join of the CTE with predicates. The 
> predicates actually specifies only one of the branch in CTE to participate in 
> the join. Thus, in some cases, e.g.,
> {code}
>/- filter(false) -TS0 
> union all  - filter(false) -TS1
>\-TS2
> {code}
> we can cut the branches of TS0 and TS1. The union becomes only TS2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16797) Support a new rule RemoveUnionBranchRule

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16797:
--


> Support a new rule RemoveUnionBranchRule
> 
>
> Key: HIVE-16797
> URL: https://issues.apache.org/jira/browse/HIVE-16797
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> in query4.q, we can see that it creates a CTE with union all of 3 branches. 
> Then it is going to do a 3 way self-join of the CTE with predicates. The 
> predicates actually specifies only one of the branch in CTE to participate in 
> the join. Thus, in some cases, e.g.,
> {code}
>/- filter(false) -TS0 
> union all  - filter(false) -TS1
>\-TS2
> {code}
> we can cut the branches of TS0 and TS1. The union becomes only TS2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16775) Skip HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031997#comment-16031997
 ] 

Pengcheng Xiong commented on HIVE-16775:


i will continue with the (a) solution.

> Skip HiveFilterAggregateTransposeRule when filter is always false
> -
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16775) Fix HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16775:
---
Summary: Fix HiveFilterAggregateTransposeRule when filter is always false  
(was: Skip HiveFilterAggregateTransposeRule when filter is always false)

> Fix HiveFilterAggregateTransposeRule when filter is always false
> 
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16775) Skip HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031995#comment-16031995
 ] 

Pengcheng Xiong commented on HIVE-16775:


[~ashutoshc], the nullscanoptimizer does not fire. The main reason is that, 
although we specify ts *. filter in the rule, it is now already physical 
optimization phase where the op tree is already cut. Thus, we will have 
ts-sel-gb-rs and then gb-filter. gb-filter will not match

> Skip HiveFilterAggregateTransposeRule when filter is always false
> -
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16769) Possible hive service startup due to the existing file /tmp/stderr

2017-05-31 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16769:

Attachment: HIVE-16769.2.patch

patch-2: redirect possible error to stderr console and let hive script to 
decide where to output. I think that makes more sense.

> Possible hive service startup due to the existing file /tmp/stderr
> --
>
> Key: HIVE-16769
> URL: https://issues.apache.org/jira/browse/HIVE-16769
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16769.1.patch, HIVE-16769.2.patch
>
>
> HIVE-12497 prints the ignoring errors from hadoop version, hbase mapredcp and 
> hadoop jars to /tmp/$USER/stderr. 
> In some cases $USER is not set, then the file becomes /tmp/stderr.  If  such 
> file preexists with different permission, it will cause the service startup 
> to fail.
> I just tried the script without outputting to stderr file, I don't see such 
> error any more {{"ERROR StatusLogger No log4j2 configuration file found. 
> Using default configuration: logging only errors to the console."}}.
> I think we can remove such redirect to avoid possible startup failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16769) Possible hive service startup due to the existing file /tmp/stderr

2017-05-31 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031963#comment-16031963
 ] 

Naveen Gangam commented on HIVE-16769:
--

The change makes sense to me. 
Re-directing output that is to be written to a process' STDERR into a file in 
/tmp can be a problem for users at certain financial institutions. They are 
generally required to save logs for an extended periods and defaulting to /tmp 
would not meet their requirements. Printing it to the console allows them to 
redirect them to a more "permanent" location via process managers/monitors that 
are use to start/stop hive services. Hope this makes sense. So +1 for me.

[~prasanth_j] and [~gopalv] Please do share your concerns if you have any.

> Possible hive service startup due to the existing file /tmp/stderr
> --
>
> Key: HIVE-16769
> URL: https://issues.apache.org/jira/browse/HIVE-16769
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16769.1.patch
>
>
> HIVE-12497 prints the ignoring errors from hadoop version, hbase mapredcp and 
> hadoop jars to /tmp/$USER/stderr. 
> In some cases $USER is not set, then the file becomes /tmp/stderr.  If  such 
> file preexists with different permission, it will cause the service startup 
> to fail.
> I just tried the script without outputting to stderr file, I don't see such 
> error any more {{"ERROR StatusLogger No log4j2 configuration file found. 
> Using default configuration: logging only errors to the console."}}.
> I think we can remove such redirect to avoid possible startup failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10848) LLAP: Better handling of hostnames when sending heartbeats to the AM

2017-05-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031926#comment-16031926
 ] 

Sergey Shelukhin commented on HIVE-10848:
-

Hmm

> LLAP: Better handling of hostnames when sending heartbeats to the AM
> 
>
> Key: HIVE-10848
> URL: https://issues.apache.org/jira/browse/HIVE-10848
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Siddharth Seth
> Fix For: llap
>
>
> Daemons send an alive message to the listening co-ordinator - along with the 
> daemon's hostname, which is used to keep tasks alive.
> This can be problematic with hostname resolution if the AM and dameons end up 
> using different hostnames.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16782) Flaky Test: TestMiniLlapLocalCliDriver[subquery_scalar]

2017-05-31 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16782:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Vihang!

> Flaky Test: TestMiniLlapLocalCliDriver[subquery_scalar]
> ---
>
> Key: HIVE-16782
> URL: https://issues.apache.org/jira/browse/HIVE-16782
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 3.0.0
>
> Attachments: HIVE-16782.01.patch
>
>
> Failing since last few builds
> https://builds.apache.org/job/PreCommit-HIVE-Build/5462/testReport
> https://builds.apache.org/job/PreCommit-HIVE-Build/5453/testReport



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16782) Flaky Test: TestMiniLlapLocalCliDriver[subquery_scalar]

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031900#comment-16031900
 ] 

Ashutosh Chauhan commented on HIVE-16782:
-

+1

> Flaky Test: TestMiniLlapLocalCliDriver[subquery_scalar]
> ---
>
> Key: HIVE-16782
> URL: https://issues.apache.org/jira/browse/HIVE-16782
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16782.01.patch
>
>
> Failing since last few builds
> https://builds.apache.org/job/PreCommit-HIVE-Build/5462/testReport
> https://builds.apache.org/job/PreCommit-HIVE-Build/5453/testReport



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16749) Run YETUS in Docker container

2017-05-31 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031879#comment-16031879
 ] 

Peter Vary commented on HIVE-16749:
---

We might take a look at the Yetus docker support:
https://yetus.apache.org/documentation/0.3.0/precommit-advanced/#Docker_Support

I am not sure we can use it, but at least we have to check

> Run YETUS in Docker container
> -
>
> Key: HIVE-16749
> URL: https://issues.apache.org/jira/browse/HIVE-16749
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> Think about the pros and cons of running YETUS in a docker container:
> - Resources
> - Usage complexity
> - Yetus version changes
> - Findbugs
> - etc.
> If worthwhile run YETUS in a docker container



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16164) Provide mechanism for passing HMS notification ID between transactional and non-transactional listeners.

2017-05-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031808#comment-16031808
 ] 

Sergio Peña commented on HIVE-16164:


[~sushanth] Thanks for the post notes. I will indeed require your eyes on other 
patches as well, so I will ping you. Regarding the hcat package. I'm good with 
the move. We're using the DbNotificationListeners for HMS replication too, and 
hcat does not seem related to that. I'll agree with the move with you.

> Provide mechanism for passing HMS notification ID between transactional and 
> non-transactional listeners.
> 
>
> Key: HIVE-16164
> URL: https://issues.apache.org/jira/browse/HIVE-16164
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 2.3.0, 3.0.0, 2.4.0
>
> Attachments: HIVE-16164.1.patch, HIVE-16164.2.patch, 
> HIVE-16164.3.patch, HIVE-16164.6.patch, HIVE-16164.7.patch, HIVE-16164.8.patch
>
>
> The HMS DB notification listener currently stores an event ID on the HMS 
> backend DB so that external applications (such as backup apps) can request 
> incremental notifications based on the last event ID requested.
> The HMS DB notification and backup applications are asynchronous. However, 
> there are sometimes that applications may be required to be in sync with the 
> latest HMS event in order to process an action. These applications will 
> provide a listener implementation that is called by the HMS after an HMS 
> transaction happened.
> The problem is that the listener running after the transaction (or during the 
> non-transactional context) may need the DB event ID in order to sync all 
> events happened previous to that event ID, but this ID is never passed to the 
> non-transactional listeners.
> We can pass this event information through the EnvironmentContext found on 
> each ListenerEvent implementations (such as CreateTableEvent), and send the 
> EnvironmentContext to the non-transactional listeners to get the event ID.
> The DbNotificactionListener already knows the event ID after calling the 
> ObjectStore.addNotificationEvent(). We just need to set this event ID to the 
> EnvironmentContext from each of the event notifications and make sure that 
> this EnvironmentContext is sent to the non-transactional listeners.
> Here's the code example when creating a table on {{create_table_core}}:
> {noformat}
>  ms.createTable(tbl);
>   if (transactionalListeners.size() > 0) {
> CreateTableEvent createTableEvent = new CreateTableEvent(tbl, true, this);
> createTableEvent.setEnvironmentContext(envContext);
> for (MetaStoreEventListener transactionalListener : 
> transactionalListeners) {
>   transactionalListener.onCreateTable(createTableEvent); // <- 
> Here the notification ID is generated
> }
>   }
>   success = ms.commitTransaction();
> } finally {
>   if (!success) {
> ms.rollbackTransaction();
> if (madeDir) {
>   wh.deleteDir(tblPath, true);
> }
>   }
>   for (MetaStoreEventListener listener : listeners) {
> CreateTableEvent createTableEvent =
> new CreateTableEvent(tbl, success, this);
> createTableEvent.setEnvironmentContext(envContext);
> listener.onCreateTable(createTableEvent);// <- 
> Here we would like to consume notification ID
>   }
> {noformat}
> We could use a specific key name that will be used on the EnvironmentContext, 
> such as DB_NOTIFICATION_EVENT_ID.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

2017-05-31 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16771:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

I committed this to master.
Thanks [~vihangk1].

> Schematool should use MetastoreSchemaInfo to get the metastore schema version 
> from database
> ---
>
> Key: HIVE-16771
> URL: https://issues.apache.org/jira/browse/HIVE-16771
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, 
> HIVE-16771.03.patch, HIVE-16771.04.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo 
> implementation to manage schema upgrades and initialization if needed. In 
> order to make HiveSchemaTool completely agnostic it should depend on 
> IMetastoreSchemaInfo implementation which is configured to get the metastore 
> schema version information from the database. It should also not assume the 
> scripts directory and hardcode it itself. It would rather ask 
> MetastoreSchemaInfo class to get the metastore scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16653) Mergejoin should give itself a correct tag

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031669#comment-16031669
 ] 

Ashutosh Chauhan commented on HIVE-16653:
-

I see. Than I see we shall leave  if (stack.size() < 2 ) check at its original 
location, since immediately after we look in to that stack. Looks good 
otherwise.

> Mergejoin should give itself a correct tag
> --
>
> Key: HIVE-16653
> URL: https://issues.apache.org/jira/browse/HIVE-16653
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16653.01.patch
>
>
> In a non-self-join mergejoin, e.g., we join tab_a and tab_b, it will give 
> different branches different tags. However in a self-join mergejoin, both of 
> the branches comes from the same table scan, we need to give it a correct tag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16653) Mergejoin should give itself a correct tag

2017-05-31 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16653:
---
Description: In a non-self-join mergejoin, e.g., we join tab_a and tab_b, 
it will give different branches different tags. However in a self-join 
mergejoin, both of the branches comes from the same table scan, we need to give 
it a correct tag.

> Mergejoin should give itself a correct tag
> --
>
> Key: HIVE-16653
> URL: https://issues.apache.org/jira/browse/HIVE-16653
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16653.01.patch
>
>
> In a non-self-join mergejoin, e.g., we join tab_a and tab_b, it will give 
> different branches different tags. However in a self-join mergejoin, both of 
> the branches comes from the same table scan, we need to give it a correct tag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16372) Enable DDL statement for non-native tables (add/remove table properties)

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031647#comment-16031647
 ] 

Ashutosh Chauhan commented on HIVE-16372:
-

+1

> Enable DDL statement for non-native tables (add/remove table properties)
> 
>
> Key: HIVE-16372
> URL: https://issues.apache.org/jira/browse/HIVE-16372
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16372.01.patch, HIVE-16372.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16495) ColumnStats merge should consider the accuracy of the current stats

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031643#comment-16031643
 ] 

Ashutosh Chauhan commented on HIVE-16495:
-

Can you create a RB for it, if its ready ?

> ColumnStats merge should consider the accuracy of the current stats
> ---
>
> Key: HIVE-16495
> URL: https://issues.apache.org/jira/browse/HIVE-16495
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16495.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16495) ColumnStats merge should consider the accuracy of the current stats

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031645#comment-16031645
 ] 

Ashutosh Chauhan commented on HIVE-16495:
-

Can you create a RB for it, if its ready ?

> ColumnStats merge should consider the accuracy of the current stats
> ---
>
> Key: HIVE-16495
> URL: https://issues.apache.org/jira/browse/HIVE-16495
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16495.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16566) Set column stats default as true when creating new tables/partitions

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031640#comment-16031640
 ] 

Ashutosh Chauhan commented on HIVE-16566:
-

Suggested to change name of function : setBasicStatsStateForCreateTable() since 
its not setting basic stats only neither for create table either.
Other than name change, +1

> Set column stats default as true when creating new tables/partitions
> 
>
> Key: HIVE-16566
> URL: https://issues.apache.org/jira/browse/HIVE-16566
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16566.01.patch, HIVE-16566.02.patch, 
> HIVE-16566.03.patch, HIVE-16566.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16629) Print thread name when thread pool is used in Hive.java

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031639#comment-16031639
 ] 

Ashutosh Chauhan commented on HIVE-16629:
-

These log messages should be at debug level and should include name of file 
they are moving.

> Print thread name when thread pool is used in Hive.java
> ---
>
> Key: HIVE-16629
> URL: https://issues.apache.org/jira/browse/HIVE-16629
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16629.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16653) Mergejoin should give itself a correct tag

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031633#comment-16031633
 ] 

Ashutosh Chauhan commented on HIVE-16653:
-

[~pxiong] Can you describe the problem little more here? As I see it, you are 
updating tag of merge work even when this rule doesnt alter anything in plan. 
That doesnt seem correct. 
Was it the case that tag was not set at all on that merge work. If so, than it 
should have been set when it was created. If it was set, then there shouldnt be 
a need to update it in this rule, since you are updating it when rule doesnt 
alter the plan at all.

> Mergejoin should give itself a correct tag
> --
>
> Key: HIVE-16653
> URL: https://issues.apache.org/jira/browse/HIVE-16653
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16653.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16789) Query with window function fails when where clause returns no records

2017-05-31 Thread Narayana (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031626#comment-16031626
 ] 

Narayana commented on HIVE-16789:
-

Yes, It fails even with Tez engine.
{code}

VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..   SUCCEEDED  1  100   0   0
Map 4 ..   SUCCEEDED  1  100   0   0
Map 5 ..   SUCCEEDED  1  100   0   0
Map 6 ..   SUCCEEDED  1  100   0   0
Map 7 ..   SUCCEEDED  1  100   0   0
Reducer 3 FAILED  1  001   4   0

VERTICES: 05/06  [=>>-] 83%   ELAPSED TIME: 20.68 s

Status: Failed
Vertex failed, vertexName=Reducer 3, vertexId=vertex_1495311508015_0744_1_05, 
diagnostics=[Task failed, taskId=task_1495311508015_0744_1_05_00, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error 
while closing operators: null
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
operators: null
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:214)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:177)
... 14 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayDeque.getFirst(Unknown Source)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$MaxStreamingFixedWindow.terminate(GenericUDAFMax.java:280)
at 
org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:196)
... 15 more
{code}

> Query with window function fails when where clause returns no records
> -
>
> Key: HIVE-16789
> URL: https://issues.apache.org/jira/browse/HIVE-16789
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.1, 1.2.1
> Environment: OS: CentOS release 6.6 (Final)
> HDP: 2.7.3
> Hive: 1.0.1,1.2.1
>Reporter: Narayana
>
> When outer where clause found no matching records, queries like these are 
> failing
> {code}
> select 
> a,b,c,d,e,
> min(c) over (partition by a,b order by d rows between 1 preceding and current 
> row) prev_c
> from
> (
> select '1234' a ,'abc' b,10 c,2 d,'test1' e
> union all 
> select '1234' a ,'abc' b,9 c,1 d,'test2' e
> union all
> select '1234' a ,'abc' b,11 c,3 d,'test2' e
> union all  
> select '1234' a ,'abcd' b,1 c,5 d,'test2' e

[jira] [Commented] (HIVE-16775) Skip HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031604#comment-16031604
 ] 

Ashutosh Chauhan commented on HIVE-16775:
-

If NullScanOptimizer is kicking in, then perf problem is not as bad. Can you 
verify that? Than, we can check this in and take up pushing filter past 
aggregates in a follow-up.

> Skip HiveFilterAggregateTransposeRule when filter is always false
> -
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16654) Optimize a combination of avg(), sum(), count(distinct) etc

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031602#comment-16031602
 ] 

Ashutosh Chauhan commented on HIVE-16654:
-

+1

> Optimize a combination of avg(), sum(), count(distinct) etc
> ---
>
> Key: HIVE-16654
> URL: https://issues.apache.org/jira/browse/HIVE-16654
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16654.01.patch, HIVE-16654.02.patch, 
> HIVE-16654.03.patch, HIVE-16654.04.patch
>
>
> an example rewrite for q28 of tpcds is 
> {code}
> (select LP as B1_LP ,CNT  as B1_CNT,CNTD as B1_CNTD
>   from (select sum(xc0) / sum(xc1) as LP, sum(xc1) as CNT, count(1) as 
> CNTD from (select sum(ss_list_price) as xc0, count(ss_list_price) as xc1 from 
> store_sales  where 
> ss_list_price is not null and ss_quantity between 0 and 5
> and (ss_list_price between 11 and 11+10 
>  or ss_coupon_amt between 460 and 460+1000
>  or ss_wholesale_cost between 14 and 14+20)
>  group by ss_list_price) ss0) ss1) B1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16775) Skip HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031581#comment-16031581
 ] 

Pengcheng Xiong commented on HIVE-16775:


[~ashutoshc], there is a nullscanoptimizer which will fire in this case, thus i 
think the performance penalty is not obvious. However, I think (a) may work, as 
I think we can generate a filter(false) rather than empty value by rewriting 
the rule. (b) basically means supporting values(), which may need more work.

> Skip HiveFilterAggregateTransposeRule when filter is always false
> -
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16625) Extra '\0' characters in the output, when SeparatedValuesOutputFormat is used and the quoting is disabled

2017-05-31 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16625:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Peter.

> Extra '\0' characters in the output, when SeparatedValuesOutputFormat is used 
> and the quoting is disabled
> -
>
> Key: HIVE-16625
> URL: https://issues.apache.org/jira/browse/HIVE-16625
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, Testing Infrastructure
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Fix For: 3.0.0
>
> Attachments: HIVE-16625.02.patch, HIVE-16625.03.patch, 
> HIVE-16625.patch
>
>
> If the output format is using {{SeparatedValuesOutputFormat}}, and the 
> quoting is disabled (by default is disabled), and the value of the cell 
> contains the separator character, then the the output is "quoted" with '\0' 
> characters.
> To reproduce:
> {code}
> create table quotes(s string);
> insert into quotes values('a\ta');
> !set outputFormat tsv2
> select * from quotes;
> {code}
> The result is:
> {code}
> quotes.s
> ^@a   a^@
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

2017-05-31 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031514#comment-16031514
 ] 

Naveen Gangam commented on HIVE-16771:
--

Changes look good to me as well. So +1 for me.

> Schematool should use MetastoreSchemaInfo to get the metastore schema version 
> from database
> ---
>
> Key: HIVE-16771
> URL: https://issues.apache.org/jira/browse/HIVE-16771
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, 
> HIVE-16771.03.patch, HIVE-16771.04.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo 
> implementation to manage schema upgrades and initialization if needed. In 
> order to make HiveSchemaTool completely agnostic it should depend on 
> IMetastoreSchemaInfo implementation which is configured to get the metastore 
> schema version information from the database. It should also not assume the 
> scripts directory and hardcode it itself. It would rather ask 
> MetastoreSchemaInfo class to get the metastore scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16591) DR for function Binaries on HDFS

2017-05-31 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-16591:
-
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Patch committed to master. Thanks Anishek!


> DR for function Binaries on HDFS 
> -
>
> Key: HIVE-16591
> URL: https://issues.apache.org/jira/browse/HIVE-16591
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16591.1.patch, HIVE-16591.2.patch, 
> HIVE-16591.3.patch
>
>
> # We have to make sure that during incremental dump we dont allow functions 
> to be copied if they have local filesystem "file://" resources.  -- depends 
> how much system side work we want to do, We are going to explicitly provide a 
> caveat for replicating functions where in, only functions created "using" 
> clause will be replicated and the "using" clause prohibits creating functions 
> with the local "file://"  resources and hence doing additional checks when 
> doing repl dump might not be required. 
> # We have to make sure that during the bootstrap / incremental dump we append 
> the namenode host + port  if functions are created without the fully 
> qualified location of uri on hdfs, not sure how this would play for S3 or 
> WASB filesystem.
> # We have to copy the binaries of a function resource list on CREATE / DROP 
> FUNCTION . The change management file system has to keep a copy of the binary 
> when a DROP function is called, to provide capability of updating binary 
> definition for existing functions along with DR. An example of list of steps 
> is given in doc (ReplicateFunctions.pdf ) attached in  parent Issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16775) Skip HiveFilterAggregateTransposeRule when filter is always false

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031339#comment-16031339
 ] 

Ashutosh Chauhan commented on HIVE-16775:
-

While I agree throwing exception and failing query is bad, but we will miss a 
huge optimization opportunity if we don't push always false filter past 
aggregate. Imagine doing billion row Gby and throwing that immediately away 
since there is a false filter immediately after. We ought to do better. 
There are 2 options as I see: a) Override Calcite rule and don't generate 
values clause in rule. Perhaps, replace with select operator.
b) We already have values clause in our grammar, so on AST we can generate 
values token with empty set. Then during genPlan() we can 
genTablePlan(DUMMY_TABLE, qb) for values.

> Skip HiveFilterAggregateTransposeRule when filter is always false
> -
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031287#comment-16031287
 ] 

Ashutosh Chauhan commented on HIVE-16774:
-

+1

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16774.01.patch, HIVE-16774.02.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16776) Strange cast behavior for table backed by druid

2017-05-31 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16776:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~ashutoshc]!

> Strange cast behavior for table backed by druid
> ---
>
> Key: HIVE-16776
> URL: https://issues.apache.org/jira/browse/HIVE-16776
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-16776.patch
>
>
> The following query 
> {code} 
> explain select SUBSTRING(`Calcs`.`str0`,CAST(`Calcs`.`int2` AS int), 3) from 
> `druid_tableau`.`calcs` `Calcs`;
> OK
> Plan not optimized by CBO. 
> {code}
> fails the cbo with the following exception 
> {code} org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Wrong 
> arguments '3': No matching method for class 
> org.apache.hadoop.hive.ql.udf.UDFSubstr with (string, bigint, int). Po
> ssible choices: _FUNC_(binary, int)  _FUNC_(binary, int, int)  _FUNC_(string, 
> int)  _FUNC_(string, int, int)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1355)
>  ~[hive-exec-2.1.0.2.6.0.2-SNAPSHOT.jar:2.1.0.2.6.0.2-SNA
> PSHOT]{code}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16776) Strange cast behavior for table backed by druid

2017-05-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031270#comment-16031270
 ] 

Ashutosh Chauhan commented on HIVE-16776:
-

+1

> Strange cast behavior for table backed by druid
> ---
>
> Key: HIVE-16776
> URL: https://issues.apache.org/jira/browse/HIVE-16776
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16776.patch
>
>
> The following query 
> {code} 
> explain select SUBSTRING(`Calcs`.`str0`,CAST(`Calcs`.`int2` AS int), 3) from 
> `druid_tableau`.`calcs` `Calcs`;
> OK
> Plan not optimized by CBO. 
> {code}
> fails the cbo with the following exception 
> {code} org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Wrong 
> arguments '3': No matching method for class 
> org.apache.hadoop.hive.ql.udf.UDFSubstr with (string, bigint, int). Po
> ssible choices: _FUNC_(binary, int)  _FUNC_(binary, int, int)  _FUNC_(string, 
> int)  _FUNC_(string, int, int)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1355)
>  ~[hive-exec-2.1.0.2.6.0.2-SNAPSHOT.jar:2.1.0.2.6.0.2-SNA
> PSHOT]{code}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16789) Query with window function fails when where clause returns no records

2017-05-31 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031135#comment-16031135
 ] 

Carter Shanklin commented on HIVE-16789:


Does the error occur when using Tez instead of MapReduce (set 
hive.execution.engine=tez;)?

> Query with window function fails when where clause returns no records
> -
>
> Key: HIVE-16789
> URL: https://issues.apache.org/jira/browse/HIVE-16789
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.1, 1.2.1
> Environment: OS: CentOS release 6.6 (Final)
> HDP: 2.7.3
> Hive: 1.0.1,1.2.1
>Reporter: Narayana
>
> When outer where clause found no matching records, queries like these are 
> failing
> {code}
> select 
> a,b,c,d,e,
> min(c) over (partition by a,b order by d rows between 1 preceding and current 
> row) prev_c
> from
> (
> select '1234' a ,'abc' b,10 c,2 d,'test1' e
> union all 
> select '1234' a ,'abc' b,9 c,1 d,'test2' e
> union all
> select '1234' a ,'abc' b,11 c,3 d,'test2' e
> union all  
> select '1234' a ,'abcd' b,1 c,5 d,'test2' e
> union all 
> select '1234' a ,'abcd' b,6 c,9 d,'test1' e
> )X
> where e='test3'
> ;
> {code}
> Error:
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: null
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.util.NoSuchElementException
>   at java.util.ArrayDeque.getFirst(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$MaxStreamingFixedWindow.terminate(GenericUDAFMax.java:280)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
>   ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16789) Query with window function fails when where clause returns no records

2017-05-31 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031135#comment-16031135
 ] 

Carter Shanklin edited comment on HIVE-16789 at 5/31/17 1:18 PM:
-

Does the error occur when using Tez instead of MapReduce

{code}
set hive.execution.engine=tez;
{code}


was (Author: cartershanklin):
Does the error occur when using Tez instead of MapReduce (set 
hive.execution.engine=tez;)?

> Query with window function fails when where clause returns no records
> -
>
> Key: HIVE-16789
> URL: https://issues.apache.org/jira/browse/HIVE-16789
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.1, 1.2.1
> Environment: OS: CentOS release 6.6 (Final)
> HDP: 2.7.3
> Hive: 1.0.1,1.2.1
>Reporter: Narayana
>
> When outer where clause found no matching records, queries like these are 
> failing
> {code}
> select 
> a,b,c,d,e,
> min(c) over (partition by a,b order by d rows between 1 preceding and current 
> row) prev_c
> from
> (
> select '1234' a ,'abc' b,10 c,2 d,'test1' e
> union all 
> select '1234' a ,'abc' b,9 c,1 d,'test2' e
> union all
> select '1234' a ,'abc' b,11 c,3 d,'test2' e
> union all  
> select '1234' a ,'abcd' b,1 c,5 d,'test2' e
> union all 
> select '1234' a ,'abcd' b,6 c,9 d,'test1' e
> )X
> where e='test3'
> ;
> {code}
> Error:
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: null
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.util.NoSuchElementException
>   at java.util.ArrayDeque.getFirst(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$MaxStreamingFixedWindow.terminate(GenericUDAFMax.java:280)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
>   ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-05-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031049#comment-16031049
 ] 

Hive QA commented on HIVE-6348:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870560/HIVE-6348.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 134 failed/errored test(s), 10804 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join0] (batchId=82)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join15] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join20] (batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join31] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join0] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_union] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_windowing] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_union] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_windowing] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[concat_op] (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer14] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_3] (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_rdd_cache] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_distinct_samekey]
 (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets_grouping]
 (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[identity_project_remove_skip]
 (batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input20] (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input33] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input3_limit] 
(batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[limit_pushdown_negative] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[list_bucket_dml_8] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[macro_duplicate] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapreduce2] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_gby2] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_gby3] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_schema_evol_2b] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[order_by_expr_1] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[order_by_expr_2] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partcols1] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_timestamp] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_varchar1] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcr] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup2] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup3] (batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd2] (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join4] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_udf_case] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[reducesink_dedup] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[rename_external_partition_location]
 (batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[truncate_column] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[truncate_column_buckets] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_paren] (batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_data_types] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_3] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round_2] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_1] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_non_string_partition]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[v

[jira] [Commented] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases

2017-05-31 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031007#comment-16031007
 ] 

Rui Li commented on HIVE-16600:
---

Hi [~kellyzly], thanks for the example. As I said, I agree disabling parallel 
orderBy for non multi insert is conservative. I just wanted to limit the scope 
of this JIRA. And another thing I'm not sure about your patch is how it handles 
a case like following. Will it be treated as multi insert?
{noformat}
RS - ... - FS
   - ... - FS
   - ... - Non FS
{noformat}

Besides, I'm wondering whether it's better to avoid such order by in 
sub-queries in the first place, as it is essentially pointless. Other DBs like 
SQL Server does this. I'm investigating it in HIVE-6348.

> Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel 
> order by in multi_insert cases
> 
>
> Key: HIVE-16600
> URL: https://issues.apache.org/jira/browse/HIVE-16600
> Project: Hive
>  Issue Type: Sub-task
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16600.1.patch, HIVE-16600.2.patch, 
> HIVE-16600.3.patch, HIVE-16600.4.patch, HIVE-16600.5.patch, 
> HIVE-16600.6.patch, HIVE-16600.7.patch, HIVE-16600.8.patch, 
> HIVE-16600.9.patch, mr.explain, mr.explain.log.HIVE-16600
>
>
> multi_insert_gby.case.q
> {code}
> set hive.exec.reducers.bytes.per.reducer=256;
> set hive.optimize.sampling.orderby=true;
> drop table if exists e1;
> drop table if exists e2;
> create table e1 (key string, value string);
> create table e2 (key string);
> FROM (select key, cast(key as double) as keyD, value from src order by key) a
> INSERT OVERWRITE TABLE e1
> SELECT key, value
> INSERT OVERWRITE TABLE e2
> SELECT key;
> select * from e1;
> select * from e2;
> {code} 
> the parallelism of Sort is 1 even we enable parallel order 
> by("hive.optimize.sampling.orderby" is set as "true").  This is not 
> reasonable because the parallelism  should be calcuated by  
> [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170]
> this is because SetSparkReducerParallelism#needSetParallelism returns false 
> when [children size of 
> RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]
>  is greater than 1.
> in this case, the children size of {{RS[2]}} is two.
> the logical plan of the case
> {code}
>TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
> -SEL[6]-FS[7]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16460) In the console output, show vertex list in topological order instead of an alphabetical sort

2017-05-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030984#comment-16030984
 ] 

Hive QA commented on HIVE-16460:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870551/HIVE-16460.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10790 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=103)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5487/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5487/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5487/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870551 - PreCommit-HIVE-Build

> In the console output, show vertex list in topological order instead of an 
> alphabetical sort
> 
>
> Key: HIVE-16460
> URL: https://issues.apache.org/jira/browse/HIVE-16460
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16460.1.patch
>
>
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-6348) Order by/Sort by in subquery

2017-05-31 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-6348:
-
Status: Patch Available  (was: Open)

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-6348) Order by/Sort by in subquery

2017-05-31 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-6348:
-
Attachment: HIVE-6348.1.patch

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-6348) Order by/Sort by in subquery

2017-05-31 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-6348:


Assignee: Rui Li

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15051) Test framework integration with findbugs, rat checks etc.

2017-05-31 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030940#comment-16030940
 ] 

Peter Vary commented on HIVE-15051:
---

Test failures are not related :)

[~kgyrtkirk]: Could you please take a look at it?

Thanks,
Peter

> Test framework integration with findbugs, rat checks etc.
> -
>
> Key: HIVE-15051
> URL: https://issues.apache.org/jira/browse/HIVE-15051
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: beeline.out, HIVE-15051.02.patch, HIVE-15051.patch, 
> Interim.patch, ql.out
>
>
> Find a way to integrate code analysis tools like findbugs, rat checks to 
> PreCommit tests, thus removing the burden from reviewers to check the code 
> style and other checks which could be done by code. 
> Might worth to take a look on Yetus, but keep in mind the Hive has a specific 
> parallel test framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16460) In the console output, show vertex list in topological order instead of an alphabetical sort

2017-05-31 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-16460:
-
Status: Patch Available  (was: Open)

> In the console output, show vertex list in topological order instead of an 
> alphabetical sort
> 
>
> Key: HIVE-16460
> URL: https://issues.apache.org/jira/browse/HIVE-16460
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16460.1.patch
>
>
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16460) In the console output, show vertex list in topological order instead of an alphabetical sort

2017-05-31 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-16460:
-
Attachment: HIVE-16460.1.patch

[~sseth] could you take a look?

> In the console output, show vertex list in topological order instead of an 
> alphabetical sort
> 
>
> Key: HIVE-16460
> URL: https://issues.apache.org/jira/browse/HIVE-16460
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16460.1.patch
>
>
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16460) In the console output, show vertex list in topological order instead of an alphabetical sort

2017-05-31 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-16460:


Assignee: Prasanth Jayachandran

> In the console output, show vertex list in topological order instead of an 
> alphabetical sort
> 
>
> Key: HIVE-16460
> URL: https://issues.apache.org/jira/browse/HIVE-16460
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
>
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails

2017-05-31 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030853#comment-16030853
 ] 

liyunzhang_intel commented on HIVE-16780:
-

[~xuefuz],[~csun]:  I guess this exception is caused by the modification of 
[DynamicPartitonPruning|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java#L266]
 after HIVE-15269.  NPE is caused by
org.apache.hadoop.hive.ql.exec.mr.ObjectCache#retrieve(java.lang.String)
{code}
@Override
  public  T retrieve(String key) throws HiveException {
return retrieve(key, null);
  }

  @Override
  public  T retrieve(String key, Callable fn) throws HiveException {
try {
  if (isDebugEnabled) {
LOG.debug("Creating " + key);
  }
  return fn.call();  // NPE is thrown here
} catch (Exception e) {
  throw new HiveException(e);
}
  }
{code}
Comparing with 
org.apache.hadoop.hive.ql.exec.tez.ObjectCache#retrieve(java.lang.String)
{code}
 public  T retrieve(String key) throws HiveException {
T value = null;
try {
  value = (T) registry.get(key);
  if ( value != null) {
LOG.info("Found " + key + " in cache with value: " + value);
  }
} catch (Exception e) {
  throw new HiveException(e);
}
return value;
  }
{code}

 if we want to fix this, we need add a similar code as what hive on tez does( 
DynamicValueRegistryTez, RegistryConfTez), appreciate if you can give some 
suggestions.


> Case "multiple sources, single key" in spark_dynamic_pruning.q fails 
> -
>
> Key: HIVE-16780
> URL: https://issues.apache.org/jira/browse/HIVE-16780
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
>
> script.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.dynamic.partition.pruning=true;
> -- multiple sources, single key
> select count(*) from srcpart join srcpart_date on (srcpart.ds = 
> srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr)
> {code}
> if disabling "hive.optimize.index.filter", case passes otherwise it always 
> hang out in the first job. Exception
> {code}
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 PerfLogger:  method=SparkInitializeOperators start=1495899585574 end=1495899585933 
> duration=359 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler>
> 17/05/27 23:39:45 INFO Executor task launch worker-0 Utilities: PLAN PATH = 
> hdfs://bdpe41:8020/tmp/hive/root/029a2d8a-c6e5-4ea9-adea-ef8fbea3cde2/hive_2017-05-27_23-39-06_464_5915518562441677640-1/-mr-10007/617d9dd6-9f9a-4786-8131-a7b98e8abc3e/map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 Utilities: Found plan 
> in cache for name: map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 DFSClient: Connecting 
> to datanode 10.239.47.162:50010
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 MapOperator: Processing 
> alias(es) srcpart_hour for file 
> hdfs://bdpe41:8020/user/hive/warehouse/srcpart_hour/08_0
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 ObjectCache: Creating 
> root_20170527233906_ac2934e1-2e58-4116-9f0d-35dee302d689_DynamicValueRegistry
> 17/05/27 23:39:45 ERROR Executor task launch worker-0 SparkMapRecordHandler: 
> Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
> Runtime Error while processing row {"hr":"11","hour":"11"}
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"hr":"11","hour":"11"}
>  at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:136)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachA

[jira] [Updated] (HIVE-16788) ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK arguments

2017-05-31 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16788:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~ashutoshc]!

> ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK 
> arguments
> --
>
> Key: HIVE-16788
> URL: https://issues.apache.org/jira/browse/HIVE-16788
> Project: Hive
>  Issue Type: Bug
>  Components: ODBC
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-16788.patch
>
>
> This ODBC call is meant to allow you to determine FK relationships either 
> from the PK side or from the FK side.
> Hive only allows you to traverse from the FK side, trying it from the PK side 
> leads to an NPE.
> Example using the table "customer" from TPC-H with FKs defined in Hive:
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'HIVE', u'tpch_bin_flat_orc_2', u'nation', u'n_nationkey', u'HIVE', 
> u'tpch_bin_flat_orc_2', u'customer', u'c_nationkey', 1, 0, 0, u'custome
> r_c2', u'nation_c1', 0)
> Not using table as foreign source
> Got an error from the server for customer!
> {code}
> Compare: Postgres
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'vagrant', u'public', u'nation', u'n_nationkey', u'vagrant', u'public', 
> u'customer', u'c_nationkey', 1, 3, 3, u'customer_c_nationkey_fkey', 
> u'nation_pkey', 7)
> Not using table as foreign source
> (u'vagrant', u'public', u'customer', u'c_custkey', u'vagrant', u'public', 
> u'orders', u'o_custkey', 1, 3, 3, u'orders_o_custkey_fkey', u'customer_pkey', 
> 7)
> {code}
> Note that Postgres allows traversal from either way. The traceback you get in 
> the HS2 logs is this:
> {code}
> 2016-12-04T21:08:55,398 ERROR [8998ca98-9940-49f8-8833-7c6ebd8c96a2 
> HiveServer2-Handler-Pool: Thread-53] metastore.RetryingHMSHandler: MetaEx
> ception(message:java.lang.NullPointerException)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5785)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_foreign_keys(HiveMetaStore.java:6474)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy25.get_foreign_keys(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getForeignKeys(HiveMetaStoreClient.java:1596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy26.getForeignKeys(Unknown Source)
> at 
> org.apache.hive.service.cli.operation.GetCrossReferenceOperation.runInternal(GetCrossReferenceOperation.java:128)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getCrossReference(HiveSessionImpl.java:933)
> at 
> org.apache.hive.service.cli.CLIService.getCrossReference(CLIService.java:411)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetCrossReference(ThriftCLIService.java:738)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1617)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1602)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadP