[jira] [Commented] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs
[ https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102789#comment-16102789 ] Hive QA commented on HIVE-17179: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879088/HIVE-17179.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11012 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=142) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6143/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6143/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6143/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879088 - PreCommit-HIVE-Build > Add InterfaceAudience and InterfaceStability annotations for Hook APIs > -- > > Key: HIVE-17179 > URL: https://issues.apache.org/jira/browse/HIVE-17179 > Project: Hive > Issue Type: Sub-task > Components: Hooks >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17179.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102777#comment-16102777 ] Matt McCline commented on HIVE-12369: - #5 cratered. > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch, HIVE-12369.06.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. > Patch is currently limited to a single Long key, aggregation on Long columns, > no more than 31 columns. > 3 new classes introduces that stored the count in the slot table and don't > allocate hash elements: > {noformat} > COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator > COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator > COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator > {noformat} > And a new class that aggregates a single Long key: > {noformat} > VectorGroupByHashOneLongKeyOperator > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12369: Status: In Progress (was: Patch Available) > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch, HIVE-12369.06.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. > Patch is currently limited to a single Long key, aggregation on Long columns, > no more than 31 columns. > 3 new classes introduces that stored the count in the slot table and don't > allocate hash elements: > {noformat} > COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator > COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator > COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator > {noformat} > And a new class that aggregates a single Long key: > {noformat} > VectorGroupByHashOneLongKeyOperator > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12369: Attachment: HIVE-12369.06.patch > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch, HIVE-12369.06.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. > Patch is currently limited to a single Long key, aggregation on Long columns, > no more than 31 columns. > 3 new classes introduces that stored the count in the slot table and don't > allocate hash elements: > {noformat} > COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator > COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator > COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator > {noformat} > And a new class that aggregates a single Long key: > {noformat} > VectorGroupByHashOneLongKeyOperator > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102762#comment-16102762 ] Rui Li commented on HIVE-16948: --- Is it possible that the DPP work doesn't contain branches, and therefore when the target work is gone, the whole DPP work/task should be removed? > Invalid explain when running dynamic partition pruning query in Hive On Spark > - > > Key: HIVE-16948 > URL: https://issues.apache.org/jira/browse/HIVE-16948 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16948_1.patch, HIVE-16948.patch > > > in > [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107] > in spark_dynamic_partition_pruning.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > explain select ds from (select distinct(ds) as ds from srcpart union all > select distinct(ds) as ds from srcpart) s where s.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); > {code} > explain > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > Edges: > Reducer 11 <- Map 10 (GROUP, 1) > Reducer 13 <- Map 12 (GROUP, 1) > DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2 > Vertices: > Map 10 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: max(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Map 12 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: min(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Reducer 11 > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: _col0 is not null (type: boolean) > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Group By Operator
[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102718#comment-16102718 ] Hive QA commented on HIVE-17113: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879072/HIVE-17113.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11012 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6142/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6142/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6142/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879072 - PreCommit-HIVE-Build > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, > HIVE-17113.3.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs
[ https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17181: Affects Version/s: 2.2.0 > HCatOutputFormat should expose complete output-schema (including > partition-keys) for dynamic-partitioning MR jobs > - > > Key: HIVE-17181 > URL: https://issues.apache.org/jira/browse/HIVE-17181 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17181.1.patch, HIVE-17181.branch-2.patch > > > Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic > partitioning are expected to call the following API methods: > # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to > write to. This call populates the {{OutputJobInfo}} with details fetched from > the Metastore. > # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data > being written. > It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows: > {code:java} > HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf)); > {code} > Unfortunately, {{getTableSchema()}} returns only the record-schema, not the > entire table's schema. We'll need a better API for use in M/R jobs to get the > complete table-schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
[ https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17153: Attachment: HIVE-17153.2.patch > Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning] > - > > Key: HIVE-17153 > URL: https://issues.apache.org/jira/browse/HIVE-17153 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17153.1.patch, HIVE-17153.2.patch > > > {code} > Client Execution succeeded but contained differences (error code = 1) after > executing spark_dynamic_partition_pruning.q > 3703c3703 > < target work: Map 4 > --- > > target work: Map 1 > 3717c3717 > < target work: Map 1 > --- > > target work: Map 4 > 3746c3746 > < target work: Map 4 > --- > > target work: Map 1 > 3760c3760 > < target work: Map 1 > --- > > target work: Map 4 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Affects Version/s: (was: 0.13.1) 2.2.0 Status: Patch Available (was: Open) > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1.patch, HIVE-8472.branch-2.2.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs
[ https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17181: Status: Patch Available (was: Open) > HCatOutputFormat should expose complete output-schema (including > partition-keys) for dynamic-partitioning MR jobs > - > > Key: HIVE-17181 > URL: https://issues.apache.org/jira/browse/HIVE-17181 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17181.1.patch, HIVE-17181.branch-2.patch > > > Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic > partitioning are expected to call the following API methods: > # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to > write to. This call populates the {{OutputJobInfo}} with details fetched from > the Metastore. > # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data > being written. > It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows: > {code:java} > HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf)); > {code} > Unfortunately, {{getTableSchema()}} returns only the record-schema, not the > entire table's schema. We'll need a better API for use in M/R jobs to get the > complete table-schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs
[ https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17181: Attachment: HIVE-17181.1.patch Patch for master. > HCatOutputFormat should expose complete output-schema (including > partition-keys) for dynamic-partitioning MR jobs > - > > Key: HIVE-17181 > URL: https://issues.apache.org/jira/browse/HIVE-17181 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17181.1.patch, HIVE-17181.branch-2.patch > > > Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic > partitioning are expected to call the following API methods: > # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to > write to. This call populates the {{OutputJobInfo}} with details fetched from > the Metastore. > # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data > being written. > It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows: > {code:java} > HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf)); > {code} > Unfortunately, {{getTableSchema()}} returns only the record-schema, not the > entire table's schema. We'll need a better API for use in M/R jobs to get the > complete table-schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs
[ https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17181: Attachment: HIVE-17181.branch-2.patch Patch for branch-2. > HCatOutputFormat should expose complete output-schema (including > partition-keys) for dynamic-partitioning MR jobs > - > > Key: HIVE-17181 > URL: https://issues.apache.org/jira/browse/HIVE-17181 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17181.branch-2.patch > > > Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic > partitioning are expected to call the following API methods: > # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to > write to. This call populates the {{OutputJobInfo}} with details fetched from > the Metastore. > # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data > being written. > It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows: > {code:java} > HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf)); > {code} > Unfortunately, {{getTableSchema()}} returns only the record-schema, not the > entire table's schema. We'll need a better API for use in M/R jobs to get the > complete table-schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs
[ https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan reassigned HIVE-17181: --- > HCatOutputFormat should expose complete output-schema (including > partition-keys) for dynamic-partitioning MR jobs > - > > Key: HIVE-17181 > URL: https://issues.apache.org/jira/browse/HIVE-17181 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic > partitioning are expected to call the following API methods: > # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to > write to. This call populates the {{OutputJobInfo}} with details fetched from > the Metastore. > # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data > being written. > It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows: > {code:java} > HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf)); > {code} > Unfortunately, {{getTableSchema()}} returns only the record-schema, not the > entire table's schema. We'll need a better API for use in M/R jobs to get the > complete table-schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Attachment: HIVE-8472.branch-2.2.patch And here's one for branch-2. > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 0.13.1 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1.patch, HIVE-8472.branch-2.2.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Attachment: HIVE-8472.1.patch Here's a patch for the master branch. > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 0.13.1 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102659#comment-16102659 ] Rui Li commented on HIVE-17087: --- +1 > Remove unnecessary HoS DPP trees during map-join conversion > --- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, > HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1, partitioned_table2 where > partitioned_table1.part_col = partitioned_table2.part_col; > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-3 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 3 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 2 > Stage: Stage-3 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: partitioned_table2 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > Local Work: > Map Reduce Local Work > Stage: Stage-1 > Spark > A masked pattern was here > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: >Inner Join 0 to 1 > keys: > 0 _col1 (type: int) > 1 _col1 (typ
[jira] [Commented] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
[ https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102657#comment-16102657 ] Hive QA commented on HIVE-17153: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879068/HIVE-17153.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11012 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6141/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6141/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6141/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879068 - PreCommit-HIVE-Build > Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning] > - > > Key: HIVE-17153 > URL: https://issues.apache.org/jira/browse/HIVE-17153 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17153.1.patch > > > {code} > Client Execution succeeded but contained differences (error code = 1) after > executing spark_dynamic_partition_pruning.q > 3703c3703 > < target work: Map 4 > --- > > target work: Map 1 > 3717c3717 > < target work: Map 1 > --- > > target work: Map 4 > 3746c3746 > < target work: Map 4 > --- > > target work: Map 1 > 3760c3760 > < target work: Map 1 > --- > > target work: Map 4 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095796#comment-16095796 ] Ke Jia edited comment on HIVE-17139 at 7/27/17 3:41 AM: With this patch, I test "select case when a=1 then trim(b) end from test_orc_5000" in my development machine. The data scale is almost 50 million records in table test_orc_5000(a int, b string) stored as ORC. The execution engine is spark. I do three experiments and the average value is as below table. The result shows the execution time of spark from 35.76s to 32.57s, the time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then expression evaluation from 4735 to 5000712. || ||Non-optimization||Optimization||Improvement|| |Hos|35.76s|32.57s|8.9%| |VectorSelectOperator|3.12s|0.89s|71.5%| |count|4735|5000712|8.99%| was (Author: jk_self): With this patch, I test "select case when a=1 then trim(b) end from test_orc_5000" in my development machine. The data scale is almost 50 million records in table test_orc_5000(a int, b string) stored as ORC. The execution engine is spark. I do three experiments and the average value is as below table. The result shows the execution time of spark from 35.76s to 32.57s, the time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then expression evaluation from 4735 to 5000712. || ||Non-optimization||Optimization||Improvement|| |Hos|35.76s|32.57s|8.9%| |VectorSelectOperator|3.12s|0.89s|7.15%| |count|4735|5000712|8.99%| > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan reassigned HIVE-8472: -- Assignee: Mithun Radhakrishnan > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 0.13.1 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17146) Spark on Hive - Exception while joining tables - "Requested replication factor of 10 exceeds maximum of x"
[ https://issues.apache.org/jira/browse/HIVE-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102625#comment-16102625 ] Rui Li commented on HIVE-17146: --- [~cabot], the code intends to distribute the hash table to more nodes so that following tasks are more likely to get the data from local DN. In that sense, it's intended to be bigger than {{dfs.replication}}. That's why we chose the magic number 10 (not an ideal solution I agree). However, since {{minReplication = Math.min(minReplication, dfsMaxReplication)}}, I still don't understand how the replication factor exceeds {{dfs.replication.max}} (by default 512)? > Spark on Hive - Exception while joining tables - "Requested replication > factor of 10 exceeds maximum of x" > --- > > Key: HIVE-17146 > URL: https://issues.apache.org/jira/browse/HIVE-17146 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.1.1, 3.0.0 >Reporter: George Smith >Assignee: Ashutosh Chauhan > > We found a bug in the current implementation of > [org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java] > The *magic number 10* for minReplication factor can cause the exception when > the configuration parameter _dfs.replication_ is lower than 10. > Consider these [properties > configuration|https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml] > on our cluster (with less than 10 nodes): > {code} > dfs.namenode.replication.min=1 > dfs.replication=2 > dfs.replication.max=512 (that's the default value) > {code} > The current implementation counts target file replication as follows > (relevant snippets of the code): > {code} > private int minReplication = 10; > ... > int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); > // minReplication value should not cross the value of dfs.replication.max > minReplication = Math.min(minReplication, dfsMaxReplication); > ... > FileSystem fs = path.getFileSystem(htsOperator.getConfiguration()); > short replication = fs.getDefaultReplication(path); > ... > int numOfPartitions = replication; > replication = (short) Math.max(minReplication, numOfPartitions); > //use replication value in fs.create(path, replication); > {code} > With a current code the used replication value is 10 and the config value > _dfs.replication_ is not used at all. > There are probably more (easy) ways to fix it: > # Set field {code}private int minReplication = 1 ; {code} I don't see any > obvious reason for the value 10.or > # Init minReplication from config value _dfs.namenode.replication.min_ with a > default value 1. or > # Count replication this way: {code}replication = Math.min(numOfPartitions, > dfsMaxReplication);{code} or > # Use replication = numOfPartitions; directly > Config value _dfs.replication_ has a default value 3 which is supposed to be > always lower than "dfs.replication.max", no checking is probably needed. > Any suggestions which option to choose? > As a *workaround* for this issue we had to set dfs.replication.max=2, but > obviously _dfs.replication_ value should NOT be ignored and the problem > should be resolved. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102615#comment-16102615 ] Hive QA commented on HIVE-17050: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879062/HIVE-17050.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11012 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join4] (batchId=81) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6140/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6140/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6140/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879062 - PreCommit-HIVE-Build > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH, HIVE-17050.4.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-16948: Attachment: HIVE-16948_1.patch upload HIVE-16948_1.patch to trigger HiveQA. > Invalid explain when running dynamic partition pruning query in Hive On Spark > - > > Key: HIVE-16948 > URL: https://issues.apache.org/jira/browse/HIVE-16948 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16948_1.patch, HIVE-16948.patch > > > in > [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107] > in spark_dynamic_partition_pruning.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > explain select ds from (select distinct(ds) as ds from srcpart union all > select distinct(ds) as ds from srcpart) s where s.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); > {code} > explain > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > Edges: > Reducer 11 <- Map 10 (GROUP, 1) > Reducer 13 <- Map 12 (GROUP, 1) > DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2 > Vertices: > Map 10 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: max(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Map 12 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: min(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Reducer 11 > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: _col0 is not null (type: boolean) > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash >
[jira] [Commented] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102562#comment-16102562 ] Eugene Koifman commented on HIVE-16077: --- no related failures for HIVE-16077.08.patch [~prasanth_j] could you review please > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.08.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Attachment: (was: HIVE-16077.07.patch) > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.08.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Attachment: (was: HIVE-16077.06.patch) > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.08.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Attachment: (was: HIVE-16077.05.patch) > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.08.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Status: Patch Available (was: Open) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Attachment: HIVE-17139.3.patch > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Status: Open (was: Patch Available) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Attachment: (was: HIVE-17139.3.patch) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17176) Add ASF header for LlapAllocatorBuffer.java
[ https://issues.apache.org/jira/browse/HIVE-17176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saijin Huang updated HIVE-17176: Affects Version/s: 3.0.0 > Add ASF header for LlapAllocatorBuffer.java > --- > > Key: HIVE-17176 > URL: https://issues.apache.org/jira/browse/HIVE-17176 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Saijin Huang >Assignee: Saijin Huang >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17176.1.patch > > > Reproduce the problem from hive-16233,find the asf header missed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17177) move TestSuite.java to the right position
[ https://issues.apache.org/jira/browse/HIVE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saijin Huang updated HIVE-17177: Affects Version/s: 3.0.0 > move TestSuite.java to the right position > - > > Key: HIVE-17177 > URL: https://issues.apache.org/jira/browse/HIVE-17177 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Saijin Huang >Assignee: Saijin Huang >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17177.1.patch > > > TestSuite.java is currently not belong to the package > org.apache.hive.storage.jdbc.Move it to the right position. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102541#comment-16102541 ] Hive QA commented on HIVE-16077: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879060/HIVE-16077.08.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11016 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6139/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6139/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6139/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879060 - PreCommit-HIVE-Build > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, > HIVE-16077.07.patch, HIVE-16077.08.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Status: Patch Available (was: Open) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Status: Open (was: Patch Available) > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, > HIVE-17139.3.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs
[ https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17179: Status: Patch Available (was: Open) > Add InterfaceAudience and InterfaceStability annotations for Hook APIs > -- > > Key: HIVE-17179 > URL: https://issues.apache.org/jira/browse/HIVE-17179 > Project: Hive > Issue Type: Sub-task > Components: Hooks >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17179.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs
[ https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17179: Attachment: HIVE-17179.1.patch > Add InterfaceAudience and InterfaceStability annotations for Hook APIs > -- > > Key: HIVE-17179 > URL: https://issues.apache.org/jira/browse/HIVE-17179 > Project: Hive > Issue Type: Sub-task > Components: Hooks >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17179.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102502#comment-16102502 ] Sahil Takiar commented on HIVE-16948: - Overall, the approach LGTM. You may need to re-attach the patch, seem Hive QA hasn't run yet. > Invalid explain when running dynamic partition pruning query in Hive On Spark > - > > Key: HIVE-16948 > URL: https://issues.apache.org/jira/browse/HIVE-16948 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16948.patch > > > in > [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107] > in spark_dynamic_partition_pruning.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > explain select ds from (select distinct(ds) as ds from srcpart union all > select distinct(ds) as ds from srcpart) s where s.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); > {code} > explain > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > Edges: > Reducer 11 <- Map 10 (GROUP, 1) > Reducer 13 <- Map 12 (GROUP, 1) > DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2 > Vertices: > Map 10 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: max(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Map 12 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: min(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Reducer 11 > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: _col0 is not null (type: boolean) > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) >
[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17113: -- Attachment: HIVE-17113.3.patch > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, > HIVE-17113.3.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable
[ https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-16710. - Resolution: Fixed Fix Version/s: 1.3.0 Release Note: This issue is resolved in 2.3 and 3.0 via a different commit (HIVE-12274) > Make MAX_MS_TYPENAME_LENGTH configurable > > > Key: HIVE-16710 > URL: https://issues.apache.org/jira/browse/HIVE-16710 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 1.3.0 > > Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch > > > HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 > (HIVE-12274), users have no way to work around this check if they do get very > long type name. We should make max type name length configurable before 2.3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
[ https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17153: Status: Patch Available (was: Open) > Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning] > - > > Key: HIVE-17153 > URL: https://issues.apache.org/jira/browse/HIVE-17153 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17153.1.patch > > > {code} > Client Execution succeeded but contained differences (error code = 1) after > executing spark_dynamic_partition_pruning.q > 3703c3703 > < target work: Map 4 > --- > > target work: Map 1 > 3717c3717 > < target work: Map 1 > --- > > target work: Map 4 > 3746c3746 > < target work: Map 4 > --- > > target work: Map 1 > 3760c3760 > < target work: Map 1 > --- > > target work: Map 4 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102468#comment-16102468 ] Hive QA commented on HIVE-17006: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879056/HIVE-17006.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11013 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part5] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[schemeAuthority] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6138/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6138/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6138/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12879056 - PreCommit-HIVE-Build > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable
[ https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reopened HIVE-16710: - actually this is still needed for branch-1 > Make MAX_MS_TYPENAME_LENGTH configurable > > > Key: HIVE-16710 > URL: https://issues.apache.org/jira/browse/HIVE-16710 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch > > > HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 > (HIVE-12274), users have no way to work around this check if they do get very > long type name. We should make max type name length configurable before 2.3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
[ https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17153: Attachment: HIVE-17153.1.patch > Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning] > - > > Key: HIVE-17153 > URL: https://issues.apache.org/jira/browse/HIVE-17153 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17153.1.patch > > > {code} > Client Execution succeeded but contained differences (error code = 1) after > executing spark_dynamic_partition_pruning.q > 3703c3703 > < target work: Map 4 > --- > > target work: Map 1 > 3717c3717 > < target work: Map 1 > --- > > target work: Map 4 > 3746c3746 > < target work: Map 4 > --- > > target work: Map 1 > 3760c3760 > < target work: Map 1 > --- > > target work: Map 4 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats
[ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102427#comment-16102427 ] Vineet Garg commented on HIVE-16811: RB is here https://reviews.apache.org/r/61165/ > Estimate statistics in absence of stats > --- > > Key: HIVE-16811 > URL: https://issues.apache.org/jira/browse/HIVE-16811 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch > > > Currently Join ordering completely bails out in absence of statistics and > this could lead to bad joins such as cross joins. > e.g. following select query will produce cross join. > {code:sql} > create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, > S_NATIONKEY INT, > S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) > CREATE TABLE lineitem (L_ORDERKEY INT, > L_PARTKEY INT, > L_SUPPKEY INT, > L_LINENUMBERINT, > L_QUANTITY DOUBLE, > L_EXTENDEDPRICE DOUBLE, > L_DISCOUNT DOUBLE, > L_TAX DOUBLE, > L_RETURNFLAGSTRING, > L_LINESTATUSSTRING, > l_shipdate STRING, > L_COMMITDATESTRING, > L_RECEIPTDATE STRING, > L_SHIPINSTRUCT STRING, > L_SHIPMODE STRING, > L_COMMENT STRING) partitioned by (dl > int) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|'; > CREATE TABLE part( > p_partkey INT, > p_name STRING, > p_mfgr STRING, > p_brand STRING, > p_type STRING, > p_size INT, > p_container STRING, > p_retailprice DOUBLE, > p_comment STRING > ); > explain select count(1) from part,supplier,lineitem where p_partkey = > l_partkey and s_suppkey = l_suppkey; > {code} > Estimating stats will prevent join ordering algorithm to bail out and come up > with join at least better than cross join -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable
[ https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102411#comment-16102411 ] Sergey Shelukhin edited comment on HIVE-16710 at 7/26/17 10:47 PM: --- Superseded by HIVE-12274 was (Author: sershe): Superceded by HIVE-12274 > Make MAX_MS_TYPENAME_LENGTH configurable > > > Key: HIVE-16710 > URL: https://issues.apache.org/jira/browse/HIVE-16710 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch > > > HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 > (HIVE-12274), users have no way to work around this check if they do get very > long type name. We should make max type name length configurable before 2.3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable
[ https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16710: Resolution: Implemented Status: Resolved (was: Patch Available) Superceded by HIVE-12274 > Make MAX_MS_TYPENAME_LENGTH configurable > > > Key: HIVE-16710 > URL: https://issues.apache.org/jira/browse/HIVE-16710 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch > > > HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 > (HIVE-12274), users have no way to work around this check if they do get very > long type name. We should make max type name length configurable before 2.3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable
[ https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16710: Fix Version/s: (was: 3.0.0) > Make MAX_MS_TYPENAME_LENGTH configurable > > > Key: HIVE-16710 > URL: https://issues.apache.org/jira/browse/HIVE-16710 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch > > > HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 > (HIVE-12274), users have no way to work around this check if they do get very > long type name. We should make max type name length configurable before 2.3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17050: -- Attachment: HIVE-17050.4.patch Hi [~pvary], I was in a rush and accidentally changed the pom.xml. Sorry for the mess! Resutmit the patch. > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH, HIVE-17050.4.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102389#comment-16102389 ] Yibing Shi edited comment on HIVE-17050 at 7/26/17 10:30 PM: - Hi [~pvary], I was in a rush and accidentally changed the pom.xml. Sorry for the mess! Resubmit the patch. was (Author: yibing): Hi [~pvary], I was in a rush and accidentally changed the pom.xml. Sorry for the mess! Resutmit the patch. > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH, HIVE-17050.4.patch > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Attachment: HIVE-16077.08.patch > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, > HIVE-16077.07.patch, HIVE-16077.08.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-11266) count(*) wrong result based on table statistics for external tables
[ https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11266: Summary: count(*) wrong result based on table statistics for external tables (was: count(*) wrong result based on table statistics) > count(*) wrong result based on table statistics for external tables > --- > > Key: HIVE-11266 > URL: https://issues.apache.org/jira/browse/HIVE-11266 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Simone Battaglia >Assignee: Pengcheng Xiong >Priority: Critical > > Hive returns wrong count result on an external table with table statistics if > I change table data files. > This is the scenario in details: > 1) create external table my_table (...) location 'my_location'; > 2) analyze table my_table compute statistics; > 3) change/add/delete one or more files in 'my_location' directory; > 4) select count(\*) from my_table; > In this case the count query doesn't generate a MR job and returns the result > based on table statistics. This result is wrong because is based on > statistics stored in the Hive metastore and doesn't take into account > modifications introduced on data files. > Obviously setting "hive.compute.query.using.stats" to FALSE this problem > doesn't occur but the default value of this property is TRUE. > I thinks that also this post on stackoverflow, that shows another type of bug > in case of multiple insert, is related to the one that I reported: > http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics
[ https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102369#comment-16102369 ] Sergey Shelukhin commented on HIVE-11266: - [~pxiong] the issue is still there for external tables. The semantics for external tables in Hive are not well defined, but many people assume (and I agree) that it's ok to manually manage these using file operations, which invalidates the stats without Hive knowing about it. I don't think this setting should be used for external tables. Thoughts? cc [~ashutoshc] [~hagleitn] > count(*) wrong result based on table statistics > --- > > Key: HIVE-11266 > URL: https://issues.apache.org/jira/browse/HIVE-11266 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Simone Battaglia >Assignee: Pengcheng Xiong >Priority: Critical > > Hive returns wrong count result on an external table with table statistics if > I change table data files. > This is the scenario in details: > 1) create external table my_table (...) location 'my_location'; > 2) analyze table my_table compute statistics; > 3) change/add/delete one or more files in 'my_location' directory; > 4) select count(\*) from my_table; > In this case the count query doesn't generate a MR job and returns the result > based on table statistics. This result is wrong because is based on > statistics stored in the Hive metastore and doesn't take into account > modifications introduced on data files. > Obviously setting "hive.compute.query.using.stats" to FALSE this problem > doesn't occur but the default value of this property is TRUE. > I thinks that also this post on stackoverflow, that shows another type of bug > in case of multiple insert, is related to the one that I reported: > http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102342#comment-16102342 ] Sergey Shelukhin commented on HIVE-17006: - RB at https://reviews.apache.org/r/61164/ [~Ferd] I cannot find your username on RB, I can add you if you tell me what it is :) > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102338#comment-16102338 ] liyunzhang_intel edited comment on HIVE-17087 at 7/26/17 9:53 PM: -- [~stakiar]: LGTM +1, meanwhile can you spend some time to review HIVE-16948, another bug about dpp in HOS, thanks! was (Author: kellyzly): [~stakiar]: GTM +1, meanwhile can you spend some time to review HIVE-16948, another bug about dpp in HOS, thanks! > Remove unnecessary HoS DPP trees during map-join conversion > --- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, > HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1, partitioned_table2 where > partitioned_table1.part_col = partitioned_table2.part_col; > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-3 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 3 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 2 > Stage: Stage-3 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: partitioned_table2 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > Local Work: > Map Reduce Local Work > Stage: Stage-1 > Spark > A masked pattern was here > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 >
[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102338#comment-16102338 ] liyunzhang_intel commented on HIVE-17087: - [~stakiar]: GTM +1, meanwhile can you spend some time to review HIVE-16948, another bug about dpp in HOS, thanks! > Remove unnecessary HoS DPP trees during map-join conversion > --- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, > HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1, partitioned_table2 where > partitioned_table1.part_col = partitioned_table2.part_col; > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-3 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 3 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 2 > Stage: Stage-3 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: partitioned_table2 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > Local Work: > Map Reduce Local Work > Stage: Stage-1 > Spark > A masked pattern was here > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: >
[jira] [Updated] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17006: Status: Patch Available (was: Open) > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102336#comment-16102336 ] Sergey Shelukhin edited comment on HIVE-17006 at 7/26/17 9:51 PM: -- The initial patch after some cleanup, additions and fixes. This shares a lot of the code with HIVE-15665 and the two metadata caches need to be merged. Presumably one of these would be committed first and the other would be merged. Still need to test on the cluster was (Author: sershe): The initial patch after some cleanup, additions and fixes. This shares a lot of the code with HIVE-15665 and the two metadata caches need to be merged. Presumably one of these would be committed first and the other would be merged. > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17006: Attachment: HIVE-17006.patch The initial patch after some cleanup, additions and fixes. This shares a lot of the code with HIVE-15665 and the two metadata caches need to be merged. Presumably one of these would be committed first and the other would be merged. > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102291#comment-16102291 ] Siddharth Seth commented on HIVE-17160: --- The DagUtils part of the changes look good to me. I believe the existing code for non-cluster split generation was already non-functional from the testing you had done? If that's the case, there's no reason to add another option to say GenerateSplitsOnClientMethod1 vs GenerateSplitsOnClientMethod2. > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102262#comment-16102262 ] Sahil Takiar commented on HIVE-16998: - [~janulatha] looks like there are some issues with building you patch, probably due to the fact that {{SparkRemoveDynamicPruningBySize.java}} was renamed. You can run {{./dev-support/smart-apply-patch.sh}} locally to see how Hive QA will apply your patch file. Also, I was taking a closer look at the new .q file you added, and something doesn't look correct in your explain plan. When DPP is enabled for your query, Stage-2 looks incorrect. It looks like there is an extraneous reduce phase in the operator plan. If you run {{spark_dynamic_partition_pruning.q}} locally, you will probably see the issue too. Can you update the RB for this JIRA? > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened
[ https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-17088: --- Resolution: Fixed Status: Resolved (was: Patch Available) > HS2 WebUI throws a NullPointerException when opened > --- > > Key: HIVE-17088 > URL: https://issues.apache.org/jira/browse/HIVE-17088 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch > > > After bumping the Jetty version to 3.9 and excluding several other > dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE > error. > {noformat} > HTTP ERROR 500 > Problem accessing /hiveserver2.jsp. Reason: > Server Error > Caused by: > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > Powered by Jetty:// 9.3.19.v20170502 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102253#comment-16102253 ] Hive QA commented on HIVE-16998: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12879034/HIVE16998.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6137/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6137/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6137/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-26 20:42:31.265 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-6137/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-26 20:42:31.268 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at f21462d HIVE-17128: Operation Logging leaks file descriptors as the log4j Appender is never closed (Andrew Sherman, reviewed by Aihua Xu and Peter Vary) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at f21462d HIVE-17128: Operation Logging leaks file descriptors as the log4j Appender is never closed (Andrew Sherman, reviewed by Aihua Xu and Peter Vary) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-26 20:42:37.439 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: patch -p0 patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Hunk #1 succeeded at 3425 (offset 19 lines). patching file itests/src/test/resources/testconfiguration.properties patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruning.java (renamed from ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java) patching file ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java patching file ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_mapjoin_only.q patching file ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_mapjoin_only.q.out + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven ANTLR Parser Generator Version 3.5.2 Output file /data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java does not exist: must build /data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g org/apache/hadoop/hive/metastore/parser/Filter.g DataNucleus Enhancer (version 4.1.17) for API "JDO" DataNucleus Enhancer : Classpath >> /usr/share/maven/boot/plexus-classworlds-2.x.jar ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo ENHANCED (Persistable) : org.apache.hadoop.hive.me
[jira] [Commented] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened
[ https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102249#comment-16102249 ] Aihua Xu commented on HIVE-17088: - +1 on the addendum patch. > HS2 WebUI throws a NullPointerException when opened > --- > > Key: HIVE-17088 > URL: https://issues.apache.org/jira/browse/HIVE-17088 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch > > > After bumping the Jetty version to 3.9 and excluding several other > dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE > error. > {noformat} > HTTP ERROR 500 > Problem accessing /hiveserver2.jsp. Reason: > Server Error > Caused by: > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > Powered by Jetty:// 9.3.19.v20170502 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened
[ https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102245#comment-16102245 ] Sergio Peña commented on HIVE-17088: Tests are not related and failing in older jobs. > HS2 WebUI throws a NullPointerException when opened > --- > > Key: HIVE-17088 > URL: https://issues.apache.org/jira/browse/HIVE-17088 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch > > > After bumping the Jetty version to 3.9 and excluding several other > dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE > error. > {noformat} > HTTP ERROR 500 > Problem accessing /hiveserver2.jsp. Reason: > Server Error > Caused by: > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > Powered by Jetty:// 9.3.19.v20170502 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-16998: Status: Patch Available (was: Open) > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102223#comment-16102223 ] Sahil Takiar commented on HIVE-16998: - [~janulatha] I changed the status of this JIRA to "Patch Available" so that Hive QA picks it up and runs the pre-commit job. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102221#comment-16102221 ] Sahil Takiar commented on HIVE-16998: - Overall, the approach LGTM, just a few more comments: * I think we should reverse the hierarchy of the configs ** If {{hive.spark.dynamic.partition.pruning.map.join.only}} is true and {{hive.spark.dynamic.partition.pruning}} is false, then DPP is only enabled for map-joins ** If {{hive.spark.dynamic.partition.pruning}} is true DPP is enabled for shuffle joins and map-joins, the value of {{hive.spark.dynamic.partition.pruning.map.join.only}} is ignored ** The advantage is that if a user wants to enable DPP just for map-joins, there is only one config to set; if they want to enable it for both shuffle-joins and map-joins, there is also only one config to set * Can you update the javadocs for {{SparkRemoveDynamicPruning}}, it no longer just removes DPP based on estimated output size * {{Stage: Stage-0 - Fetch Operator}} isn't actually a Spark job, its the {{FetchOperator}} that is run by HS2 * There is a line in the qfile that says {{-- checking without partition pruning enabled}} even though dpp is enabled > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-16998: --- Attachment: HIVE16998.2.patch Incorporated comments from Lefty and Sahil. Modularized call to check for mapjoin, so that the same can be called from split dpp and check to remove. Renamed to SparkRemoveDynamicPruningBySize.java => SparkRemoveDynamicPruning.java. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch, HIVE16998.2.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs
[ https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned HIVE-17179: --- > Add InterfaceAudience and InterfaceStability annotations for Hook APIs > -- > > Key: HIVE-17179 > URL: https://issues.apache.org/jira/browse/HIVE-17179 > Project: Hive > Issue Type: Sub-task > Components: Hooks >Reporter: Sahil Takiar >Assignee: Sahil Takiar > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17178) Spark Partition Pruning Sink Operator can't target multiple Works
[ https://issues.apache.org/jira/browse/HIVE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned HIVE-17178: --- > Spark Partition Pruning Sink Operator can't target multiple Works > - > > Key: HIVE-17178 > URL: https://issues.apache.org/jira/browse/HIVE-17178 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > A Spark Partition Pruning Sink Operator cannot be used to target multiple Map > Work objects. The entire DPP subtree (SEL-GBY-SPARKPRUNINGSINK) is duplicated > if a single table needs to be used to target multiple Map Works. > The following query shows the issue: > {code} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table part_table_1 (col int) partitioned by (part_col int); > create table part_table_2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table part_table_1 add partition (part_col=1); > insert into table part_table_1 partition (part_col=1) values (1), (2), (3), > (4); > alter table part_table_1 add partition (part_col=2); > insert into table part_table_1 partition (part_col=2) values (1), (2), (3), > (4); > alter table part_table_2 add partition (part_col=1); > insert into table part_table_2 partition (part_col=1) values (1), (2), (3), > (4); > alter table part_table_2 add partition (part_col=2); > insert into table part_table_2 partition (part_col=2) values (1), (2), (3), > (4); > explain select * from regular_table, part_table_1, part_table_2 where > regular_table.col = part_table_1.part_col and regular_table.col = > part_table_2.part_col; > {code} > The explain plan is > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: regular_table > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: col is not null (type: boolean) > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col0 (type: int) > 1 _col1 (type: int) > 2 _col1 (type: int) > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 2 > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 3 > Local Work: > Map Reduce Local Work > Map 3 > Map Operator Tree:
[jira] [Updated] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12369: Description: Implement Native Vector GroupBy using fast hash table technology developed for Native Vector MapJoin, etc. Patch is currently limited to a single Long key, aggregation on Long columns, no more than 31 columns. 3 new classes introduces that stored the count in the slot table and don't allocate hash elements: {noformat} COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator {noformat} And a new class that aggregates a single Long key: {noformat} VectorGroupByHashOneLongKeyOperator {noformat} was:Implement Native Vector GroupBy using fast hash table technology developed for Native Vector MapJoin, etc. > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. > Patch is currently limited to a single Long key, aggregation on Long columns, > no more than 31 columns. > 3 new classes introduces that stored the count in the slot table and don't > allocate hash elements: > {noformat} > COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator > COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator > COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator > {noformat} > And a new class that aggregates a single Long key: > {noformat} > VectorGroupByHashOneLongKeyOperator > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12369: Status: Patch Available (was: In Progress) > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12369: Attachment: HIVE-12369.05.patch > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12369: Status: In Progress (was: Patch Available) > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16758) Better Select Number of Replications
[ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102124#comment-16102124 ] BELUGA BEHR commented on HIVE-16758: [~aihuaxu] [~pvary] [~ctang.ma] Thoughts? > Better Select Number of Replications > > > Key: HIVE-16758 > URL: https://issues.apache.org/jira/browse/HIVE-16758 > Project: Hive > Issue Type: Improvement >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16758.1.patch > > > {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}} > We should be smarter about how we pick a replication number. We should add a > new configuration equivalent to {{mapreduce.client.submit.file.replication}}. > This value should be around the square root of the number of nodes and not > hard-coded in the code. > {code} > public static final String DFS_REPLICATION_MAX = "dfs.replication.max"; > private int minReplication = 10; > @Override > protected void initializeOp(Configuration hconf) throws HiveException { > ... > int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); > // minReplication value should not cross the value of dfs.replication.max > minReplication = Math.min(minReplication, dfsMaxReplication); > } > {code} > https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17132) Add InterfaceAudience and InterfaceStability annotations for UDF APIs
[ https://issues.apache.org/jira/browse/HIVE-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102112#comment-16102112 ] Sahil Takiar commented on HIVE-17132: - [~ashutoshc] - created RB: https://reviews.apache.org/r/61145/ > Add InterfaceAudience and InterfaceStability annotations for UDF APIs > - > > Key: HIVE-17132 > URL: https://issues.apache.org/jira/browse/HIVE-17132 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17132.1.patch > > > Add InterfaceAudience and InterfaceStability annotations for UDF APIs. UDFs > are a useful plugin point for Hive users, and there are a number of external > UDF libraries, such as hivemall. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Status: Open (was: Patch Available) > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, > HIVE-16077.07.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Attachment: HIVE-16077.07.patch > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, > HIVE-16077.07.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Status: Patch Available (was: Open) > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch, > HIVE-16077.07.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17132) Add InterfaceAudience and InterfaceStability annotations for UDF APIs
[ https://issues.apache.org/jira/browse/HIVE-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102086#comment-16102086 ] Ashutosh Chauhan commented on HIVE-17132: - Can you please create a RB for this patch? > Add InterfaceAudience and InterfaceStability annotations for UDF APIs > - > > Key: HIVE-17132 > URL: https://issues.apache.org/jira/browse/HIVE-17132 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17132.1.patch > > > Add InterfaceAudience and InterfaceStability annotations for UDF APIs. UDFs > are a useful plugin point for Hive users, and there are a number of external > UDF libraries, such as hivemall. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats
[ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102084#comment-16102084 ] Ashutosh Chauhan commented on HIVE-16811: - Can you please create a RB for latest patch ? > Estimate statistics in absence of stats > --- > > Key: HIVE-16811 > URL: https://issues.apache.org/jira/browse/HIVE-16811 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch > > > Currently Join ordering completely bails out in absence of statistics and > this could lead to bad joins such as cross joins. > e.g. following select query will produce cross join. > {code:sql} > create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, > S_NATIONKEY INT, > S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) > CREATE TABLE lineitem (L_ORDERKEY INT, > L_PARTKEY INT, > L_SUPPKEY INT, > L_LINENUMBERINT, > L_QUANTITY DOUBLE, > L_EXTENDEDPRICE DOUBLE, > L_DISCOUNT DOUBLE, > L_TAX DOUBLE, > L_RETURNFLAGSTRING, > L_LINESTATUSSTRING, > l_shipdate STRING, > L_COMMITDATESTRING, > L_RECEIPTDATE STRING, > L_SHIPINSTRUCT STRING, > L_SHIPMODE STRING, > L_COMMENT STRING) partitioned by (dl > int) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|'; > CREATE TABLE part( > p_partkey INT, > p_name STRING, > p_mfgr STRING, > p_brand STRING, > p_type STRING, > p_size INT, > p_container STRING, > p_retailprice DOUBLE, > p_comment STRING > ); > explain select count(1) from part,supplier,lineitem where p_partkey = > l_partkey and s_suppkey = l_suppkey; > {code} > Estimating stats will prevent join ordering algorithm to bail out and come up > with join at least better than cross join -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened
[ https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102043#comment-16102043 ] Hive QA commented on HIVE-17088: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878993/HIVE-17088.addendum1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11012 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[setop_no_distinct] (batchId=77) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6134/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6134/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6134/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12878993 - PreCommit-HIVE-Build > HS2 WebUI throws a NullPointerException when opened > --- > > Key: HIVE-17088 > URL: https://issues.apache.org/jira/browse/HIVE-17088 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch > > > After bumping the Jetty version to 3.9 and excluding several other > dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE > error. > {noformat} > HTTP ERROR 500 > Problem accessing /hiveserver2.jsp. Reason: > Server Error > Caused by: > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.Select
[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101841#comment-16101841 ] Sahil Takiar commented on HIVE-17087: - [~lirui], [~kellyzly] patch has been update, any other comments? > Remove unnecessary HoS DPP trees during map-join conversion > --- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, > HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1, partitioned_table2 where > partitioned_table1.part_col = partitioned_table2.part_col; > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-3 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 3 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 2 > Stage: Stage-3 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: partitioned_table2 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > Local Work: > Map Reduce Local Work > Stage: Stage-1 > Spark > A masked pattern was here > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: >Inner Join 0 to 1 > keys: >
[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened
[ https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-17088: --- Attachment: (was: HIVE-17088.addendum1.patch) > HS2 WebUI throws a NullPointerException when opened > --- > > Key: HIVE-17088 > URL: https://issues.apache.org/jira/browse/HIVE-17088 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch > > > After bumping the Jetty version to 3.9 and excluding several other > dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE > error. > {noformat} > HTTP ERROR 500 > Problem accessing /hiveserver2.jsp. Reason: > Server Error > Caused by: > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > Powered by Jetty:// 9.3.19.v20170502 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Attachment: HIVE-16077.06.patch > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch, HIVE-16077.05.patch, HIVE-16077.06.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed
[ https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101771#comment-16101771 ] Andrew Sherman commented on HIVE-17128: --- Thank you [~pvary]! > Operation Logging leaks file descriptors as the log4j Appender is never closed > -- > > Key: HIVE-17128 > URL: https://issues.apache.org/jira/browse/HIVE-17128 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Fix For: 3.0.0 > > Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch, > HIVE-17128.3.patch > > > [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 > RoutingAppender to automatically output the log for each query into each > individual operation log file. As log4j does not know when a query is > finished it keeps the OutputStream in the Appender open even when the query > completes. The stream holds a file descriptor and so we leak file > descriptors. Note that we are already careful to close any streams reading > from the operation log file. > h2. Fix > To fix this we use a technique described in the comments of [LOG4J2-510] > which uses reflection to close the appender. The test in > TestOperationLoggingLayout will be extended to check that the Appender is > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101749#comment-16101749 ] Peter Vary commented on HIVE-17050: --- Thanks for the patch [~Yibing]! Just one quick question: - Why are there pom xml changes in the patch? Otherwise looks good to me. BTW: I am not sure, that the patch will be picked up by the pre-commit, if the extension is in upper case. Please check it later, if you have time. Thanks, Peter > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened
[ https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-17088: --- Attachment: HIVE-17088.addendum1.patch All tests are flaky and past jobs are running into the same errors, but there is one test {{TestJdbcWithMiniHS2}} that is new. It does not look is related to the patch, though, but I will restart HiveQA to see if it continue failing. It does not fail on my local machine. > HS2 WebUI throws a NullPointerException when opened > --- > > Key: HIVE-17088 > URL: https://issues.apache.org/jira/browse/HIVE-17088 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch, > HIVE-17088.addendum1.patch > > > After bumping the Jetty version to 3.9 and excluding several other > dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE > error. > {noformat} > HTTP ERROR 500 > Problem accessing /hiveserver2.jsp. Reason: > Server Error > Caused by: > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > Powered by Jetty:// 9.3.19.v20170502 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"
[ https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17050: -- Attachment: HIVE-17050.3.PATCH Submit a new patch > Multiline queries that have comment in middle fail when executed via "beeline > -e" > - > > Key: HIVE-17050 > URL: https://issues.apache.org/jira/browse/HIVE-17050 > Project: Hive > Issue Type: Bug >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, > HIVE-17050.3.PATCH > > > After applying HIVE-13864, multiple line queries that have comment at the end > of one of the middle lines fail when executed via beeline -e > {noformat} > $ beeline -u "" -e "select 1, --test > > 2" > scan complete in 3ms > .. > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:9 > cannot recognize input near '' '' '' in selection target > (state=42000,code=4) > Closing: 0: > jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101538#comment-16101538 ] anishek commented on HIVE-16896: self reminder to add docs for "hive.repl.approx.max.load.tasks" > move replication load related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16896 > URL: https://issues.apache.org/jira/browse/HIVE-16896 > Project: Hive > Issue Type: Sub-task >Reporter: anishek >Assignee: anishek > > we want to not create too many tasks in memory in the analysis phase while > loading data. Currently we load all the files in the bootstrap dump location > as {{FileStatus[]}} and then iterate over it to load objects, we should > rather move to > {code} > org.apache.hadoop.fs.RemoteIteratorlistFiles(Path > f, boolean recursive) > {code} > which would internally batch and return values. > additionally since we cant hand off partial tasks from analysis pahse => > execution phase, we are going to move the whole repl load functionality to > execution phase so we can better control creation/execution of tasks (not > related to hive {{Task}}, we may get rid of ReplCopyTask) > Additional consideration to take into account at the end of this jira is to > see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16614) Support "set local time zone" statement
[ https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16614: --- Attachment: HIVE-16614.03.patch > Support "set local time zone" statement > --- > > Key: HIVE-16614 > URL: https://issues.apache.org/jira/browse/HIVE-16614 > Project: Hive > Issue Type: Improvement >Reporter: Carter Shanklin >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16614.01.patch, HIVE-16614.02.patch, > HIVE-16614.03.patch, HIVE-16614.patch > > > HIVE-14412 introduces a timezone-aware timestamp. > SQL has a concept of default time zone displacements, which are transparently > applied when converting between timezone-unaware types and timezone-aware > types and, in Hive's case, are also used to shift a timezone aware type to a > different time zone, depending on configuration. > SQL also provides that the default time zone displacement be settable at a > session level, so that clients can access a database simultaneously from > different time zones and see time values in their own time zone. > Currently the time zone displacement is fixed and is set based on the system > time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be > more convenient for users if they have the ability to set their time zone of > choice. > SQL defines "set time zone" with 2 ways of specifying the time zone, first > using an interval and second using the special keyword LOCAL. > Examples: > • set time zone '-8:00'; > • set time zone LOCAL; > LOCAL means to set the current default time zone displacement to the > session's original default time zone displacement. > Reference: SQL:2011 section 19.4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed
[ https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17128: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to the master. Thanks for your contribution [~asherman]! > Operation Logging leaks file descriptors as the log4j Appender is never closed > -- > > Key: HIVE-17128 > URL: https://issues.apache.org/jira/browse/HIVE-17128 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Fix For: 3.0.0 > > Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch, > HIVE-17128.3.patch > > > [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 > RoutingAppender to automatically output the log for each query into each > individual operation log file. As log4j does not know when a query is > finished it keeps the OutputStream in the Appender open even when the query > completes. The stream holds a file descriptor and so we leak file > descriptors. Note that we are already careful to close any streams reading > from the operation log file. > h2. Fix > To fix this we use a technique described in the comments of [LOG4J2-510] > which uses reflection to close the appender. The test in > TestOperationLoggingLayout will be extended to check that the Appender is > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests
[ https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17072: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks for the patch [~kuczoram]! Pushed to master. Please take some time, and document the new config here: https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-BeelineQueryUnitTest Thanks, Peter > Make the parallelized timeout configurable in BeeLine tests > --- > > Key: HIVE-17072 > URL: https://issues.apache.org/jira/browse/HIVE-17072 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17072.1.patch, HIVE-17072.2.patch > > > When running the BeeLine tests parallel, the timeout is hardcoded in the > Parallelized.java: > {noformat} > @Override > public void finished() { > executor.shutdown(); > try { > executor.awaitTermination(10, TimeUnit.MINUTES); > } catch (InterruptedException exc) { > throw new RuntimeException(exc); > } > } > {noformat} > It would be better to make it configurable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17052) Remove logging of predicate filters
[ https://issues.apache.org/jira/browse/HIVE-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17052: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks for the patch [~Yibing]. Pushed to master > Remove logging of predicate filters > --- > > Key: HIVE-17052 > URL: https://issues.apache.org/jira/browse/HIVE-17052 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Yibing Shi > Fix For: 3.0.0 > > Attachments: HIVE-17052.1.patch > > > HIVE-16869 added the filter predicate to the debug log of HS2, but since > these filters may contain sensitive information they should not be logged out. > The log statement should be changed back to the original form. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16679) Missing ASF header on properties file in ptest2 project
[ https://issues.apache.org/jira/browse/HIVE-16679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101362#comment-16101362 ] Peter Vary edited comment on HIVE-16679 at 7/26/17 8:51 AM: Thanks for the patch [~zsombor.klara]. Pushed to master was (Author: pvary): Thanks for the patch [~zsombor.klara] > Missing ASF header on properties file in ptest2 project > --- > > Key: HIVE-16679 > URL: https://issues.apache.org/jira/browse/HIVE-16679 > Project: Hive > Issue Type: Bug >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-16679.01.patch > > > The ASF header is missing on > {{testutils/ptest2//conf/deployed/master-mr2.properties}} causing the build > of the ptest2 project to fail on a RAT check. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16679) Missing ASF header on properties file in ptest2 project
[ https://issues.apache.org/jira/browse/HIVE-16679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-16679: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch [~zsombor.klara] > Missing ASF header on properties file in ptest2 project > --- > > Key: HIVE-16679 > URL: https://issues.apache.org/jira/browse/HIVE-16679 > Project: Hive > Issue Type: Bug >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-16679.01.patch > > > The ASF header is missing on > {{testutils/ptest2//conf/deployed/master-mr2.properties}} causing the build > of the ptest2 project to fail on a RAT check. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16954) LLAP IO: better debugging
[ https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101303#comment-16101303 ] Lefty Leverenz commented on HIVE-16954: --- Doc note & question: This adds *hive.llap.io.trace.size* and *hive.llap.io.trace.always.dump* to HiveConf.java, so they need to be documented in the wiki. The default value of *hive.llap.io.trace.always.dump* was true in the initial commit but changed to false in an addendum commit (#20276d2113f669a2ea08480ce76df9bd6b913d09 and #726f270a6e5c720a98ac58f2c4a549e70b45fbad). * [Configuration Properties -- LLAP I/O | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAPI/O] Added a TODOC2.4 label. Question: In the description of *hive.llap.io.trace.always.dump*, what does "the default is on error" mean? It was in the initial commit as well as the addendum commit, so I'm not sure if it refers to the true or false value. > LLAP IO: better debugging > - > > Key: HIVE-16954 > URL: https://issues.apache.org/jira/browse/HIVE-16954 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.4 > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16954-branch-2.patch, HIVE-16954.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16954) LLAP IO: better debugging
[ https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-16954: -- Labels: TODOC2.4 (was: ) > LLAP IO: better debugging > - > > Key: HIVE-16954 > URL: https://issues.apache.org/jira/browse/HIVE-16954 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.4 > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16954-branch-2.patch, HIVE-16954.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17177) move TestSuite.java to the right position
[ https://issues.apache.org/jira/browse/HIVE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saijin Huang updated HIVE-17177: Attachment: HIVE-17177.1.patch > move TestSuite.java to the right position > - > > Key: HIVE-17177 > URL: https://issues.apache.org/jira/browse/HIVE-17177 > Project: Hive > Issue Type: Bug >Reporter: Saijin Huang >Assignee: Saijin Huang >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17177.1.patch > > > TestSuite.java is currently not belong to the package > org.apache.hive.storage.jdbc.Move it to the right position. -- This message was sent by Atlassian JIRA (v6.4.14#64029)