date:20170726

[jira] [Commented] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs

2017-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102789#comment-16102789
 ] 

Hive QA commented on HIVE-17179:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879088/HIVE-17179.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6143/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6143/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6143/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879088 - PreCommit-HIVE-Build

> Add InterfaceAudience and InterfaceStability annotations for Hook APIs
> --
>
> Key: HIVE-17179
> URL: https://issues.apache.org/jira/browse/HIVE-17179
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hooks
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17179.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-12369) Native Vector GroupBy

2017-07-26 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102777#comment-16102777
 ] 

Matt McCline commented on HIVE-12369:
-

#5 cratered.

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch, HIVE-12369.06.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.
> Patch is currently limited to a single Long key, aggregation on Long columns, 
> no more than 31 columns.
> 3 new classes introduces that stored the count in the slot table and don't 
> allocate hash elements:
> {noformat}
>   COUNT(column)  VectorGroupByHashOneLongKeyCountColumnOperator  
>   COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
>   COUNT(*)   VectorGroupByHashOneLongKeyCountStarOperator   
> {noformat}
> And a new class that aggregates a single Long key:
> {noformat}
>   VectorGroupByHashOneLongKeyOperator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-12369) Native Vector GroupBy

2017-07-26 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12369:

Status: In Progress  (was: Patch Available)

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch, HIVE-12369.06.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.
> Patch is currently limited to a single Long key, aggregation on Long columns, 
> no more than 31 columns.
> 3 new classes introduces that stored the count in the slot table and don't 
> allocate hash elements:
> {noformat}
>   COUNT(column)  VectorGroupByHashOneLongKeyCountColumnOperator  
>   COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
>   COUNT(*)   VectorGroupByHashOneLongKeyCountStarOperator   
> {noformat}
> And a new class that aggregates a single Long key:
> {noformat}
>   VectorGroupByHashOneLongKeyOperator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-12369) Native Vector GroupBy

2017-07-26 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12369:

Attachment: HIVE-12369.06.patch

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch, HIVE-12369.06.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.
> Patch is currently limited to a single Long key, aggregation on Long columns, 
> no more than 31 columns.
> 3 new classes introduces that stored the count in the slot table and don't 
> allocate hash elements:
> {noformat}
>   COUNT(column)  VectorGroupByHashOneLongKeyCountColumnOperator  
>   COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
>   COUNT(*)   VectorGroupByHashOneLongKeyCountStarOperator   
> {noformat}
> And a new class that aggregates a single Long key:
> {noformat}
>   VectorGroupByHashOneLongKeyOperator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-26 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102762#comment-16102762
 ] 

Rui Li commented on HIVE-16948:
---

Is it possible that the DPP work doesn't contain branches, and therefore when 
the target work is gone, the whole DPP work/task should be removed?

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator

[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102718#comment-16102718
 ] 

Hive QA commented on HIVE-17113:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879072/HIVE-17113.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6142/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6142/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6142/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879072 - PreCommit-HIVE-Build

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, 
> HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Affects Version/s: 2.2.0

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17153:

Attachment: HIVE-17153.2.patch

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17153.1.patch, HIVE-17153.2.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Affects Version/s: (was: 0.13.1)
   2.2.0
   Status: Patch Available  (was: Open)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.branch-2.2.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Status: Patch Available  (was: Open)

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Attachment: HIVE-17181.1.patch

Patch for master.

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Attachment: HIVE-17181.branch-2.patch

Patch for branch-2.

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17181:
---


> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Attachment: HIVE-8472.branch-2.2.patch

And here's one for branch-2.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 0.13.1
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.branch-2.2.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Attachment: HIVE-8472.1.patch

Here's a patch for the master branch.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 0.13.1
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-26 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102659#comment-16102659
 ] 

Rui Li commented on HIVE-17087:
---

+1

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, 
> HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1, partitioned_table2 where 
> partitioned_table1.part_col = partitioned_table2.part_col;
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 2
>   Stage: Stage-3
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table2
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (typ

[jira] [Commented] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102657#comment-16102657
 ] 

Hive QA commented on HIVE-17153:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879068/HIVE-17153.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6141/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6141/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6141/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879068 - PreCommit-HIVE-Build

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17153.1.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-26 Thread Ke Jia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095796#comment-16095796
 ] 

Ke Jia edited comment on HIVE-17139 at 7/27/17 3:41 AM:


With this patch, I test "select case when a=1 then trim(b) end from 
test_orc_5000" in my development machine. The data scale is almost 50 million 
records in table test_orc_5000(a int, b string) stored as ORC. The execution 
engine is spark. I do three experiments and the average value is as below 
table. The result shows the execution time of spark from 35.76s to 32.57s, the 
time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then 
expression evaluation from 4735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|71.5%|
|count|4735|5000712|8.99%|










was (Author: jk_self):
With this patch, I test "select case when a=1 then trim(b) end from 
test_orc_5000" in my development machine. The data scale is almost 50 million 
records in table test_orc_5000(a int, b string) stored as ORC. The execution 
engine is spark. I do three experiments and the average value is as below 
table. The result shows the execution time of spark from 35.76s to 32.57s, the 
time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then 
expression evaluation from 4735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|7.15%|
|count|4735|5000712|8.99%|









> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-8472:
--

Assignee: Mithun Radhakrishnan

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 0.13.1
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17146) Spark on Hive - Exception while joining tables - "Requested replication factor of 10 exceeds maximum of x"

2017-07-26 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102625#comment-16102625
 ] 

Rui Li commented on HIVE-17146:
---

[~cabot], the code intends to distribute the hash table to more nodes so that 
following tasks are more likely to get the data from local DN. In that sense, 
it's intended to be bigger than {{dfs.replication}}. That's why we chose the 
magic number 10 (not an ideal solution I agree).
However, since {{minReplication = Math.min(minReplication, 
dfsMaxReplication)}}, I still don't understand how the replication factor 
exceeds {{dfs.replication.max}} (by default 512)?

> Spark on Hive - Exception while joining tables - "Requested replication 
> factor of 10 exceeds maximum of x" 
> ---
>
> Key: HIVE-17146
> URL: https://issues.apache.org/jira/browse/HIVE-17146
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1, 3.0.0
>Reporter: George Smith
>Assignee: Ashutosh Chauhan
>
> We found a bug in the current implementation of 
> [org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java]
> The *magic number 10* for minReplication factor can cause the exception when 
> the configuration parameter _dfs.replication_ is lower than 10. 
> Consider these [properties 
> configuration|https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]
>  on our cluster (with less than 10 nodes):
> {code}
> dfs.namenode.replication.min=1
> dfs.replication=2
> dfs.replication.max=512 (that's the default value)
> {code}
> The current implementation counts target file replication as follows 
> (relevant snippets of the code):
> {code}
> private int minReplication = 10;
> ...
> int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
> // minReplication value should not cross the value of dfs.replication.max
> minReplication = Math.min(minReplication, dfsMaxReplication);
> ...
> FileSystem fs = path.getFileSystem(htsOperator.getConfiguration());
> short replication = fs.getDefaultReplication(path);
> ...
> int numOfPartitions = replication;
> replication = (short) Math.max(minReplication, numOfPartitions);
> //use replication value in fs.create(path, replication);
> {code}
> With a current code the used replication value is 10 and the config value 
> _dfs.replication_ is not used at all.
> There are probably more (easy) ways to fix it:
> # Set field  {code}private int minReplication = 1 ; {code} I don't see any 
> obvious reason for the value 10.or
> # Init minReplication from config value _dfs.namenode.replication.min_ with a 
> default value 1. or
> # Count replication this way: {code}replication = Math.min(numOfPartitions, 
> dfsMaxReplication);{code} or
> # Use replication = numOfPartitions; directly
> Config value _dfs.replication_ has a default value 3 which is supposed to be 
> always lower than "dfs.replication.max", no checking is probably needed.
> Any suggestions which option to choose? 
> As a *workaround* for this issue we had to set dfs.replication.max=2, but 
> obviously _dfs.replication_ value should NOT be ignored and the problem 
> should be resolved.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"

2017-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102615#comment-16102615
 ] 

Hive QA commented on HIVE-17050:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879062/HIVE-17050.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join4] 
(batchId=81)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
 (batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6140/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6140/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6140/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879062 - PreCommit-HIVE-Build

> Multiline queries that have comment in middle fail when executed via "beeline 
> -e"
> -
>
> Key: HIVE-17050
> URL: https://issues.apache.org/jira/browse/HIVE-17050
> Project: Hive
>  Issue Type: Bug
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, 
> HIVE-17050.3.PATCH, HIVE-17050.4.patch
>
>
> After applying HIVE-13864, multiple line queries that have comment at the end 
> of one of the middle lines fail when executed via beeline -e
> {noformat}
> $ beeline -u "" -e "select 1, --test
> > 2"
> scan complete in 3ms
> ..
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:9 
> cannot recognize input near '' '' '' in selection target 
> (state=42000,code=4)
> Closing: 0: 
> jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-26 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16948:

Attachment: HIVE-16948_1.patch

upload HIVE-16948_1.patch to trigger HiveQA.

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
>

[jira] [Commented] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102562#comment-16102562
 ] 

Eugene Koifman commented on HIVE-16077:
---

no related failures for HIVE-16077.08.patch
[~prasanth_j] could you review please

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.08.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Attachment: (was: HIVE-16077.07.patch)

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.08.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Attachment: (was: HIVE-16077.06.patch)

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.08.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Attachment: (was: HIVE-16077.05.patch)

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.08.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-26 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Status: Patch Available  (was: Open)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-26 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Attachment: HIVE-17139.3.patch

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-26 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Status: Open  (was: Patch Available)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-26 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Attachment: (was: HIVE-17139.3.patch)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17176) Add ASF header for LlapAllocatorBuffer.java

2017-07-26 Thread Saijin Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-17176:

Affects Version/s: 3.0.0

> Add ASF header for LlapAllocatorBuffer.java
> ---
>
> Key: HIVE-17176
> URL: https://issues.apache.org/jira/browse/HIVE-17176
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17176.1.patch
>
>
> Reproduce the problem from hive-16233,find the asf header missed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17177) move TestSuite.java to the right position

2017-07-26 Thread Saijin Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-17177:

Affects Version/s: 3.0.0

> move TestSuite.java to the right position
> -
>
> Key: HIVE-17177
> URL: https://issues.apache.org/jira/browse/HIVE-17177
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17177.1.patch
>
>
> TestSuite.java is currently not belong to the package 
> org.apache.hive.storage.jdbc.Move it to the right position.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102541#comment-16102541
 ] 

Hive QA commented on HIVE-16077:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879060/HIVE-16077.08.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11016 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6139/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6139/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6139/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879060 - PreCommit-HIVE-Build

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, 
> HIVE-16077.07.patch, HIVE-16077.08.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-26 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Status: Patch Available  (was: Open)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-26 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Status: Open  (was: Patch Available)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs

2017-07-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17179:

Status: Patch Available  (was: Open)

> Add InterfaceAudience and InterfaceStability annotations for Hook APIs
> --
>
> Key: HIVE-17179
> URL: https://issues.apache.org/jira/browse/HIVE-17179
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hooks
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17179.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs

2017-07-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17179:

Attachment: HIVE-17179.1.patch

> Add InterfaceAudience and InterfaceStability annotations for Hook APIs
> --
>
> Key: HIVE-17179
> URL: https://issues.apache.org/jira/browse/HIVE-17179
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hooks
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17179.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-26 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102502#comment-16102502
 ] 

Sahil Takiar commented on HIVE-16948:
-

Overall, the approach LGTM. You may need to re-attach the patch, seem Hive QA 
hasn't run yet.

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
>

[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-26 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17113:
--
Attachment: HIVE-17113.3.patch

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, 
> HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-07-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-16710.
-
   Resolution: Fixed
Fix Version/s: 1.3.0
 Release Note: This issue is resolved in 2.3 and 3.0 via a different commit 
(HIVE-12274)

> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 1.3.0
>
> Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17153:

Status: Patch Available  (was: Open)

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17153.1.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102468#comment-16102468
 ] 

Hive QA commented on HIVE-17006:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879056/HIVE-17006.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part5]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[schemeAuthority]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6138/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6138/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6138/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879056 - PreCommit-HIVE-Build

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-07-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reopened HIVE-16710:
-

actually this is still needed for branch-1


> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17153:

Attachment: HIVE-17153.1.patch

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17153.1.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-07-26 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102427#comment-16102427
 ] 

Vineet Garg commented on HIVE-16811:


RB is here https://reviews.apache.org/r/61165/


> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-07-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102411#comment-16102411
 ] 

Sergey Shelukhin edited comment on HIVE-16710 at 7/26/17 10:47 PM:
---

Superseded by HIVE-12274


was (Author: sershe):
Superceded by HIVE-12274

> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-07-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16710:

Resolution: Implemented
Status: Resolved  (was: Patch Available)

Superceded by HIVE-12274

> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-07-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16710:

Fix Version/s: (was: 3.0.0)

> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"

2017-07-26 Thread Yibing Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-17050:
--
Attachment: HIVE-17050.4.patch

Hi [~pvary], I was in a rush and accidentally changed the pom.xml. Sorry for 
the mess!
Resutmit the patch.

> Multiline queries that have comment in middle fail when executed via "beeline 
> -e"
> -
>
> Key: HIVE-17050
> URL: https://issues.apache.org/jira/browse/HIVE-17050
> Project: Hive
>  Issue Type: Bug
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, 
> HIVE-17050.3.PATCH, HIVE-17050.4.patch
>
>
> After applying HIVE-13864, multiple line queries that have comment at the end 
> of one of the middle lines fail when executed via beeline -e
> {noformat}
> $ beeline -u "" -e "select 1, --test
> > 2"
> scan complete in 3ms
> ..
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:9 
> cannot recognize input near '' '' '' in selection target 
> (state=42000,code=4)
> Closing: 0: 
> jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"

2017-07-26 Thread Yibing Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102389#comment-16102389
 ] 

Yibing Shi edited comment on HIVE-17050 at 7/26/17 10:30 PM:
-

Hi [~pvary], I was in a rush and accidentally changed the pom.xml. Sorry for 
the mess!
Resubmit the patch.


was (Author: yibing):
Hi [~pvary], I was in a rush and accidentally changed the pom.xml. Sorry for 
the mess!
Resutmit the patch.

> Multiline queries that have comment in middle fail when executed via "beeline 
> -e"
> -
>
> Key: HIVE-17050
> URL: https://issues.apache.org/jira/browse/HIVE-17050
> Project: Hive
>  Issue Type: Bug
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, 
> HIVE-17050.3.PATCH, HIVE-17050.4.patch
>
>
> After applying HIVE-13864, multiple line queries that have comment at the end 
> of one of the middle lines fail when executed via beeline -e
> {noformat}
> $ beeline -u "" -e "select 1, --test
> > 2"
> scan complete in 3ms
> ..
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:9 
> cannot recognize input near '' '' '' in selection target 
> (state=42000,code=4)
> Closing: 0: 
> jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Attachment: HIVE-16077.08.patch

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, 
> HIVE-16077.07.patch, HIVE-16077.08.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2017-07-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11266:

Summary: count(*) wrong result based on table statistics for external 
tables  (was: count(*) wrong result based on table statistics)

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics

2017-07-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102369#comment-16102369
 ] 

Sergey Shelukhin commented on HIVE-11266:
-

[~pxiong] the issue is still there for external tables. The semantics for 
external tables in Hive are not well defined, but many people assume (and I 
agree) that it's ok to manually manage these using file operations, which 
invalidates the stats without Hive knowing about it.
I don't think this setting should be used for external tables.
Thoughts?
cc [~ashutoshc] [~hagleitn]


> count(*) wrong result based on table statistics
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-07-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102342#comment-16102342
 ] 

Sergey Shelukhin commented on HIVE-17006:
-

RB at https://reviews.apache.org/r/61164/ [~Ferd] I cannot find your username 
on RB, I can add you if you tell me what it is :)

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-26 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102338#comment-16102338
 ] 

liyunzhang_intel edited comment on HIVE-17087 at 7/26/17 9:53 PM:
--

[~stakiar]:  LGTM +1,  meanwhile can you spend some time to review HIVE-16948, 
another bug about dpp in HOS, thanks!


was (Author: kellyzly):
[~stakiar]:  GTM +1,  meanwhile can you spend some time to review HIVE-16948, 
another bug about dpp in HOS, thanks!

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, 
> HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1, partitioned_table2 where 
> partitioned_table1.part_col = partitioned_table2.part_col;
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 2
>   Stage: Stage-3
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table2
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
>

[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-26 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102338#comment-16102338
 ] 

liyunzhang_intel commented on HIVE-17087:
-

[~stakiar]:  GTM +1,  meanwhile can you spend some time to review HIVE-16948, 
another bug about dpp in HOS, thanks!

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, 
> HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1, partitioned_table2 where 
> partitioned_table1.part_col = partitioned_table2.part_col;
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 2
>   Stage: Stage-3
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table2
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>

[jira] [Updated] (HIVE-17006) LLAP: Parquet caching

2017-07-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17006:

Status: Patch Available  (was: Open)

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17006) LLAP: Parquet caching

2017-07-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102336#comment-16102336
 ] 

Sergey Shelukhin edited comment on HIVE-17006 at 7/26/17 9:51 PM:
--

The initial patch after some cleanup, additions and fixes.
This shares a lot of the code with HIVE-15665 and the two metadata caches need 
to be merged. Presumably one of these would be committed first and the other 
would be merged.
Still need to test on the cluster


was (Author: sershe):
The initial patch after some cleanup, additions and fixes.
This shares a lot of the code with HIVE-15665 and the two metadata caches need 
to be merged. Presumably one of these would be committed first and the other 
would be merged.

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17006) LLAP: Parquet caching

2017-07-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17006:

Attachment: HIVE-17006.patch

The initial patch after some cleanup, additions and fixes.
This shares a lot of the code with HIVE-15665 and the two metadata caches need 
to be merged. Presumably one of these would be committed first and the other 
would be merged.

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-26 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102291#comment-16102291
 ] 

Siddharth Seth commented on HIVE-17160:
---

The DagUtils part of the changes look good to me.
I believe the existing code for non-cluster split generation was already 
non-functional from the testing you had done? If that's the case, there's no 
reason to add another option to say GenerateSplitsOnClientMethod1 vs 
GenerateSplitsOnClientMethod2.

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-26 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102262#comment-16102262
 ] 

Sahil Takiar commented on HIVE-16998:
-

[~janulatha] looks like there are some issues with building you patch, probably 
due to the fact that {{SparkRemoveDynamicPruningBySize.java}} was renamed. You 
can run {{./dev-support/smart-apply-patch.sh}} locally to see how Hive QA will 
apply your patch file.

Also, I was taking a closer look at the new .q file you added, and something 
doesn't look correct in your explain plan. When DPP is enabled for your query, 
Stage-2 looks incorrect. It looks like there is an extraneous reduce phase in 
the operator plan. If you run {{spark_dynamic_partition_pruning.q}} locally, 
you will probably see the issue too.

Can you update the RB for this JIRA?

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened

2017-07-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17088:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> HS2 WebUI throws a NullPointerException when opened
> ---
>
> Key: HIVE-17088
> URL: https://issues.apache.org/jira/browse/HIVE-17088
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch
>
>
> After bumping the Jetty version to 3.9 and excluding several other 
> dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE 
> error.
> {noformat}
> HTTP ERROR 500
> Problem accessing /hiveserver2.jsp. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException
>   at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Powered by Jetty:// 9.3.19.v20170502
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102253#comment-16102253
 ] 

Hive QA commented on HIVE-16998:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879034/HIVE16998.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6137/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6137/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6137/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-26 20:42:31.265
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6137/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-26 20:42:31.268
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at f21462d HIVE-17128: Operation Logging leaks file descriptors as 
the log4j Appender is never closed (Andrew Sherman, reviewed by Aihua Xu and 
Peter Vary)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at f21462d HIVE-17128: Operation Logging leaks file descriptors as 
the log4j Appender is never closed (Andrew Sherman, reviewed by Aihua Xu and 
Peter Vary)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-26 20:42:37.439
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p0
patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Hunk #1 succeeded at 3425 (offset 19 lines).
patching file itests/src/test/resources/testconfiguration.properties
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruning.java 
(renamed from 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java)
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java
patching file 
ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_mapjoin_only.q
patching file 
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_mapjoin_only.q.out
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.17) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED (Persistable) : org.apache.hadoop.hive.me

[jira] [Commented] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened

2017-07-26 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102249#comment-16102249
 ] 

Aihua Xu commented on HIVE-17088:
-

+1 on the addendum patch.

> HS2 WebUI throws a NullPointerException when opened
> ---
>
> Key: HIVE-17088
> URL: https://issues.apache.org/jira/browse/HIVE-17088
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch
>
>
> After bumping the Jetty version to 3.9 and excluding several other 
> dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE 
> error.
> {noformat}
> HTTP ERROR 500
> Problem accessing /hiveserver2.jsp. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException
>   at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Powered by Jetty:// 9.3.19.v20170502
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened

2017-07-26 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102245#comment-16102245
 ] 

Sergio Peña commented on HIVE-17088:


Tests are not related and failing in older jobs.

> HS2 WebUI throws a NullPointerException when opened
> ---
>
> Key: HIVE-17088
> URL: https://issues.apache.org/jira/browse/HIVE-17088
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch
>
>
> After bumping the Jetty version to 3.9 and excluding several other 
> dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE 
> error.
> {noformat}
> HTTP ERROR 500
> Problem accessing /hiveserver2.jsp. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException
>   at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Powered by Jetty:// 9.3.19.v20170502
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16998:

Status: Patch Available  (was: Open)

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-26 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102223#comment-16102223
 ] 

Sahil Takiar commented on HIVE-16998:
-

[~janulatha] I changed the status of this JIRA to "Patch Available" so that 
Hive QA picks it up and runs the pre-commit job.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-26 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102221#comment-16102221
 ] 

Sahil Takiar commented on HIVE-16998:
-

Overall, the approach LGTM, just a few more comments:

* I think we should reverse the hierarchy of the configs
** If {{hive.spark.dynamic.partition.pruning.map.join.only}} is true and 
{{hive.spark.dynamic.partition.pruning}} is false, then DPP is only enabled for 
map-joins
** If {{hive.spark.dynamic.partition.pruning}} is true DPP is enabled for 
shuffle joins and map-joins, the value of 
{{hive.spark.dynamic.partition.pruning.map.join.only}} is ignored
** The advantage is that if a user wants to enable DPP just for map-joins, 
there is only one config to set; if they want to enable it for both 
shuffle-joins and map-joins, there is also only one config to set
* Can you update the javadocs for {{SparkRemoveDynamicPruning}}, it no longer 
just removes DPP based on estimated output size
* {{Stage: Stage-0 - Fetch Operator}} isn't actually a Spark job, its the 
{{FetchOperator}} that is run by HS2
* There is a line in the qfile that says {{-- checking without partition 
pruning enabled}} even though dpp is enabled

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-26 Thread Janaki Lahorani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16998:
---
Attachment: HIVE16998.2.patch

Incorporated comments from Lefty and Sahil.
Modularized call to check for mapjoin, so that the same can be called from 
split dpp and check to remove.
Renamed to SparkRemoveDynamicPruningBySize.java => 
SparkRemoveDynamicPruning.java.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs

2017-07-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17179:
---


> Add InterfaceAudience and InterfaceStability annotations for Hook APIs
> --
>
> Key: HIVE-17179
> URL: https://issues.apache.org/jira/browse/HIVE-17179
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hooks
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17178) Spark Partition Pruning Sink Operator can't target multiple Works

2017-07-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17178:
---


> Spark Partition Pruning Sink Operator can't target multiple Works
> -
>
> Key: HIVE-17178
> URL: https://issues.apache.org/jira/browse/HIVE-17178
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> A Spark Partition Pruning Sink Operator cannot be used to target multiple Map 
> Work objects. The entire DPP subtree (SEL-GBY-SPARKPRUNINGSINK) is duplicated 
> if a single table needs to be used to target multiple Map Works.
> The following query shows the issue:
> {code}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table part_table_1 (col int) partitioned by (part_col int);
> create table part_table_2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table part_table_1 add partition (part_col=1);
> insert into table part_table_1 partition (part_col=1) values (1), (2), (3), 
> (4);
> alter table part_table_1 add partition (part_col=2);
> insert into table part_table_1 partition (part_col=2) values (1), (2), (3), 
> (4);
> alter table part_table_2 add partition (part_col=1);
> insert into table part_table_2 partition (part_col=1) values (1), (2), (3), 
> (4);
> alter table part_table_2 add partition (part_col=2);
> insert into table part_table_2 partition (part_col=2) values (1), (2), (3), 
> (4);
> explain select * from regular_table, part_table_1, part_table_2 where 
> regular_table.col = part_table_1.part_col and regular_table.col = 
> part_table_2.part_col;
> {code}
> The explain plan is
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: regular_table
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   Filter Operator
> predicate: col is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: col (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark HashTable Sink Operator
> keys:
>   0 _col0 (type: int)
>   1 _col1 (type: int)
>   2 _col1 (type: int)
>   Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   keys: _col0 (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark Partition Pruning Sink Operator
> partition key expr: part_col
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> target column name: part_col
> target work: Map 2
>   Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   keys: _col0 (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark Partition Pruning Sink Operator
> partition key expr: part_col
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> target column name: part_col
> target work: Map 3
> Local Work:
>   Map Reduce Local Work
> Map 3 
> Map Operator Tree:

[jira] [Updated] (HIVE-12369) Native Vector GroupBy

2017-07-26 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12369:

Description: 
Implement Native Vector GroupBy using fast hash table technology developed for 
Native Vector MapJoin, etc.

Patch is currently limited to a single Long key, aggregation on Long columns, 
no more than 31 columns.

3 new classes introduces that stored the count in the slot table and don't 
allocate hash elements:

{noformat}
  COUNT(column)  VectorGroupByHashOneLongKeyCountColumnOperator  
  COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
  COUNT(*)   VectorGroupByHashOneLongKeyCountStarOperator   
{noformat}

And a new class that aggregates a single Long key:

{noformat}
  VectorGroupByHashOneLongKeyOperator
{noformat}

  was:Implement Native Vector GroupBy using fast hash table technology 
developed for Native Vector MapJoin, etc.


> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.
> Patch is currently limited to a single Long key, aggregation on Long columns, 
> no more than 31 columns.
> 3 new classes introduces that stored the count in the slot table and don't 
> allocate hash elements:
> {noformat}
>   COUNT(column)  VectorGroupByHashOneLongKeyCountColumnOperator  
>   COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
>   COUNT(*)   VectorGroupByHashOneLongKeyCountStarOperator   
> {noformat}
> And a new class that aggregates a single Long key:
> {noformat}
>   VectorGroupByHashOneLongKeyOperator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-12369) Native Vector GroupBy

2017-07-26 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12369:

Status: Patch Available  (was: In Progress)

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-12369) Native Vector GroupBy

2017-07-26 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12369:

Attachment: HIVE-12369.05.patch

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-12369) Native Vector GroupBy

2017-07-26 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12369:

Status: In Progress  (was: Patch Available)

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16758) Better Select Number of Replications

2017-07-26 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102124#comment-16102124
 ] 

BELUGA BEHR commented on HIVE-16758:


[~aihuaxu] [~pvary] [~ctang.ma] Thoughts?

> Better Select Number of Replications
> 
>
> Key: HIVE-16758
> URL: https://issues.apache.org/jira/browse/HIVE-16758
> Project: Hive
>  Issue Type: Improvement
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16758.1.patch
>
>
> {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
> We should be smarter about how we pick a replication number.  We should add a 
> new configuration equivalent to {{mapreduce.client.submit.file.replication}}. 
>  This value should be around the square root of the number of nodes and not 
> hard-coded in the code.
> {code}
> public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
> private int minReplication = 10;
>   @Override
>   protected void initializeOp(Configuration hconf) throws HiveException {
> ...
> int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
> // minReplication value should not cross the value of dfs.replication.max
> minReplication = Math.min(minReplication, dfsMaxReplication);
>   }
> {code}
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17132) Add InterfaceAudience and InterfaceStability annotations for UDF APIs

2017-07-26 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102112#comment-16102112
 ] 

Sahil Takiar commented on HIVE-17132:
-

[~ashutoshc] - created RB: https://reviews.apache.org/r/61145/

> Add InterfaceAudience and InterfaceStability annotations for UDF APIs
> -
>
> Key: HIVE-17132
> URL: https://issues.apache.org/jira/browse/HIVE-17132
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17132.1.patch
>
>
> Add InterfaceAudience and InterfaceStability annotations for UDF APIs. UDFs 
> are a useful plugin point for Hive users, and there are a number of external 
> UDF libraries, such as hivemall.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Status: Open  (was: Patch Available)

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, 
> HIVE-16077.07.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Attachment: HIVE-16077.07.patch

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, 
> HIVE-16077.07.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Status: Patch Available  (was: Open)

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, 
> HIVE-16077.07.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17132) Add InterfaceAudience and InterfaceStability annotations for UDF APIs

2017-07-26 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102086#comment-16102086
 ] 

Ashutosh Chauhan commented on HIVE-17132:
-

Can you please create a RB for this patch?

> Add InterfaceAudience and InterfaceStability annotations for UDF APIs
> -
>
> Key: HIVE-17132
> URL: https://issues.apache.org/jira/browse/HIVE-17132
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17132.1.patch
>
>
> Add InterfaceAudience and InterfaceStability annotations for UDF APIs. UDFs 
> are a useful plugin point for Hive users, and there are a number of external 
> UDF libraries, such as hivemall.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-07-26 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102084#comment-16102084
 ] 

Ashutosh Chauhan commented on HIVE-16811:
-

Can you please create a RB for latest patch ?

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened

2017-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102043#comment-16102043
 ] 

Hive QA commented on HIVE-17088:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878993/HIVE-17088.addendum1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[setop_no_distinct] 
(batchId=77)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6134/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6134/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6134/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878993 - PreCommit-HIVE-Build

> HS2 WebUI throws a NullPointerException when opened
> ---
>
> Key: HIVE-17088
> URL: https://issues.apache.org/jira/browse/HIVE-17088
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch
>
>
> After bumping the Jetty version to 3.9 and excluding several other 
> dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE 
> error.
> {noformat}
> HTTP ERROR 500
> Problem accessing /hiveserver2.jsp. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException
>   at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.Select

[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-26 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101841#comment-16101841
 ] 

Sahil Takiar commented on HIVE-17087:
-

[~lirui], [~kellyzly] patch has been update, any other comments?

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, 
> HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1, partitioned_table2 where 
> partitioned_table1.part_col = partitioned_table2.part_col;
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 2
>   Stage: Stage-3
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table2
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   keys:
>

[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened

2017-07-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17088:
---
Attachment: (was: HIVE-17088.addendum1.patch)

> HS2 WebUI throws a NullPointerException when opened
> ---
>
> Key: HIVE-17088
> URL: https://issues.apache.org/jira/browse/HIVE-17088
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch
>
>
> After bumping the Jetty version to 3.9 and excluding several other 
> dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE 
> error.
> {noformat}
> HTTP ERROR 500
> Problem accessing /hiveserver2.jsp. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException
>   at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Powered by Jetty:// 9.3.19.v20170502
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Attachment: HIVE-16077.06.patch

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-26 Thread Andrew Sherman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101771#comment-16101771
 ] 

Andrew Sherman commented on HIVE-17128:
---

Thank you [~pvary]!

> Operation Logging leaks file descriptors as the log4j Appender is never closed
> --
>
> Key: HIVE-17128
> URL: https://issues.apache.org/jira/browse/HIVE-17128
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Fix For: 3.0.0
>
> Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch, 
> HIVE-17128.3.patch
>
>
> [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 
> RoutingAppender to automatically output the log for each query into each 
> individual operation log file. As log4j does not know when a query is 
> finished it keeps the OutputStream in the Appender open even when the query 
> completes. The stream holds a file descriptor and so we leak file 
> descriptors. Note that we are already careful to close any streams reading 
> from the operation log file.
> h2. Fix
> To fix this we use a technique described in the comments of [LOG4J2-510] 
> which uses reflection to close the appender. The test in 
> TestOperationLoggingLayout will be extended to check that the Appender is 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"

2017-07-26 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101749#comment-16101749
 ] 

Peter Vary commented on HIVE-17050:
---

Thanks for the patch [~Yibing]!

Just one quick question:
- Why are there pom xml changes in the patch?

Otherwise looks good to me.

BTW: I am not sure, that the patch will be picked up by the pre-commit, if the 
extension is in upper case. Please check it later, if you have time.

Thanks,
Peter

> Multiline queries that have comment in middle fail when executed via "beeline 
> -e"
> -
>
> Key: HIVE-17050
> URL: https://issues.apache.org/jira/browse/HIVE-17050
> Project: Hive
>  Issue Type: Bug
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, 
> HIVE-17050.3.PATCH
>
>
> After applying HIVE-13864, multiple line queries that have comment at the end 
> of one of the middle lines fail when executed via beeline -e
> {noformat}
> $ beeline -u "" -e "select 1, --test
> > 2"
> scan complete in 3ms
> ..
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:9 
> cannot recognize input near '' '' '' in selection target 
> (state=42000,code=4)
> Closing: 0: 
> jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened

2017-07-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17088:
---
Attachment: HIVE-17088.addendum1.patch

All tests are flaky and past jobs are running into the same errors, but there 
is one test {{TestJdbcWithMiniHS2}} that is new. It does not look is related to 
the patch, though, but I will restart HiveQA to see if it continue failing. It 
does not fail on my local machine.

> HS2 WebUI throws a NullPointerException when opened
> ---
>
> Key: HIVE-17088
> URL: https://issues.apache.org/jira/browse/HIVE-17088
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch, 
> HIVE-17088.addendum1.patch
>
>
> After bumping the Jetty version to 3.9 and excluding several other 
> dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE 
> error.
> {noformat}
> HTTP ERROR 500
> Problem accessing /hiveserver2.jsp. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException
>   at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Powered by Jetty:// 9.3.19.v20170502
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"

2017-07-26 Thread Yibing Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-17050:
--
Attachment: HIVE-17050.3.PATCH

Submit a new patch

> Multiline queries that have comment in middle fail when executed via "beeline 
> -e"
> -
>
> Key: HIVE-17050
> URL: https://issues.apache.org/jira/browse/HIVE-17050
> Project: Hive
>  Issue Type: Bug
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, 
> HIVE-17050.3.PATCH
>
>
> After applying HIVE-13864, multiple line queries that have comment at the end 
> of one of the middle lines fail when executed via beeline -e
> {noformat}
> $ beeline -u "" -e "select 1, --test
> > 2"
> scan complete in 3ms
> ..
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:9 
> cannot recognize input near '' '' '' in selection target 
> (state=42000,code=4)
> Closing: 0: 
> jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-07-26 Thread anishek (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101538#comment-16101538
 ] 

anishek commented on HIVE-16896:


self reminder to add docs for "hive.repl.approx.max.load.tasks"

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16614) Support "set local time zone" statement

2017-07-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16614:
---
Attachment: HIVE-16614.03.patch

> Support "set local time zone" statement
> ---
>
> Key: HIVE-16614
> URL: https://issues.apache.org/jira/browse/HIVE-16614
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16614.01.patch, HIVE-16614.02.patch, 
> HIVE-16614.03.patch, HIVE-16614.patch
>
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of default time zone displacements, which are transparently 
> applied when converting between timezone-unaware types and timezone-aware 
> types and, in Hive's case, are also used to shift a timezone aware type to a 
> different time zone, depending on configuration.
> SQL also provides that the default time zone displacement be settable at a 
> session level, so that clients can access a database simultaneously from 
> different time zones and see time values in their own time zone.
> Currently the time zone displacement is fixed and is set based on the system 
> time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be 
> more convenient for users if they have the ability to set their time zone of 
> choice.
> SQL defines "set time zone" with 2 ways of specifying the time zone, first 
> using an interval and second using the special keyword LOCAL.
> Examples:
>   • set time zone '-8:00';
>   • set time zone LOCAL;
> LOCAL means to set the current default time zone displacement to the 
> session's original default time zone displacement.
> Reference: SQL:2011 section 19.4



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-26 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17128:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to the master.
Thanks for your contribution [~asherman]!


> Operation Logging leaks file descriptors as the log4j Appender is never closed
> --
>
> Key: HIVE-17128
> URL: https://issues.apache.org/jira/browse/HIVE-17128
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Fix For: 3.0.0
>
> Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch, 
> HIVE-17128.3.patch
>
>
> [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 
> RoutingAppender to automatically output the log for each query into each 
> individual operation log file. As log4j does not know when a query is 
> finished it keeps the OutputStream in the Appender open even when the query 
> completes. The stream holds a file descriptor and so we leak file 
> descriptors. Note that we are already careful to close any streams reading 
> from the operation log file.
> h2. Fix
> To fix this we use a technique described in the comments of [LOG4J2-510] 
> which uses reflection to close the appender. The test in 
> TestOperationLoggingLayout will be extended to check that the Appender is 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests

2017-07-26 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17072:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch [~kuczoram]! Pushed to master.

Please take some time, and document the new config here: 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-BeelineQueryUnitTest

Thanks,
Peter

> Make the parallelized timeout configurable in BeeLine tests
> ---
>
> Key: HIVE-17072
> URL: https://issues.apache.org/jira/browse/HIVE-17072
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17072.1.patch, HIVE-17072.2.patch
>
>
> When running the BeeLine tests parallel, the timeout is hardcoded in the 
> Parallelized.java:
> {noformat}
> @Override
> public void finished() {
>   executor.shutdown();
>   try {
> executor.awaitTermination(10, TimeUnit.MINUTES);
>   } catch (InterruptedException exc) {
> throw new RuntimeException(exc);
>   }
> }
> {noformat}
> It would be better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17052) Remove logging of predicate filters

2017-07-26 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17052:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch [~Yibing]. Pushed to master

> Remove logging of predicate filters
> ---
>
> Key: HIVE-17052
> URL: https://issues.apache.org/jira/browse/HIVE-17052
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Yibing Shi
> Fix For: 3.0.0
>
> Attachments: HIVE-17052.1.patch
>
>
> HIVE-16869 added the filter predicate to the debug log of HS2, but since 
> these filters may contain sensitive information they should not be logged out.
> The log statement should be changed back to the original form.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-16679) Missing ASF header on properties file in ptest2 project

2017-07-26 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101362#comment-16101362
 ] 

Peter Vary edited comment on HIVE-16679 at 7/26/17 8:51 AM:


Thanks for the patch [~zsombor.klara]. Pushed to master


was (Author: pvary):
Thanks for the patch [~zsombor.klara]

> Missing ASF header on properties file in ptest2 project
> ---
>
> Key: HIVE-16679
> URL: https://issues.apache.org/jira/browse/HIVE-16679
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-16679.01.patch
>
>
> The ASF header is missing on 
> {{testutils/ptest2//conf/deployed/master-mr2.properties}} causing the build 
> of the ptest2 project to fail on a RAT check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16679) Missing ASF header on properties file in ptest2 project

2017-07-26 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16679:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch [~zsombor.klara]

> Missing ASF header on properties file in ptest2 project
> ---
>
> Key: HIVE-16679
> URL: https://issues.apache.org/jira/browse/HIVE-16679
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-16679.01.patch
>
>
> The ASF header is missing on 
> {{testutils/ptest2//conf/deployed/master-mr2.properties}} causing the build 
> of the ptest2 project to fail on a RAT check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16954) LLAP IO: better debugging

2017-07-26 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101303#comment-16101303
 ] 

Lefty Leverenz commented on HIVE-16954:
---

Doc note & question:  This adds *hive.llap.io.trace.size* and 
*hive.llap.io.trace.always.dump* to HiveConf.java, so they need to be 
documented in the wiki.

The default value of *hive.llap.io.trace.always.dump* was true in the initial 
commit but changed to false in an addendum commit 
(#20276d2113f669a2ea08480ce76df9bd6b913d09 and 
#726f270a6e5c720a98ac58f2c4a549e70b45fbad).

* [Configuration Properties -- LLAP I/O | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAPI/O]

Added a TODOC2.4 label.

Question:  In the description of *hive.llap.io.trace.always.dump*, what does 
"the default is on error" mean?  It was in the initial commit as well as the 
addendum commit, so I'm not sure if it refers to the true or false value.

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.4
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16954-branch-2.patch, HIVE-16954.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16954) LLAP IO: better debugging

2017-07-26 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16954:
--
Labels: TODOC2.4  (was: )

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.4
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16954-branch-2.patch, HIVE-16954.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17177) move TestSuite.java to the right position

2017-07-26 Thread Saijin Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-17177:

Attachment: HIVE-17177.1.patch

> move TestSuite.java to the right position
> -
>
> Key: HIVE-17177
> URL: https://issues.apache.org/jira/browse/HIVE-17177
> Project: Hive
>  Issue Type: Bug
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17177.1.patch
>
>
> TestSuite.java is currently not belong to the package 
> org.apache.hive.storage.jdbc.Move it to the right position.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 >

1 - 100 of 106 matches

Mail list logo