[jira] [Commented] (HIVE-17037) Use 1-to-1 Tez edge to avoid unnecessary input data shuffle
[ https://issues.apache.org/jira/browse/HIVE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095819#comment-16095819 ] Lefty Leverenz commented on HIVE-17037: --- Doc note: This adds *hive.optimize.joinreducededuplication* to HiveConf.java, so it will need to be documented in the wiki. * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] Added a TODOC3.0 label. > Use 1-to-1 Tez edge to avoid unnecessary input data shuffle > --- > > Key: HIVE-17037 > URL: https://issues.apache.org/jira/browse/HIVE-17037 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-17037.01.patch, HIVE-17037.02.patch, > HIVE-17037.03.patch, HIVE-17037.patch > > > As an example, consider the following query: > {code:sql} > SELECT * > FROM ( > SELECT a.value > FROM src1 a > JOIN src1 b > ON (a.value = b.value) > GROUP BY a.value > ) a > JOIN src > ON (a.value = src.value); > {code} > Currently, the plan generated for Tez will contain an unnecessary shuffle > operation between the subquery and the join, since the records produced by > the subquery are already sorted by the value. > This issue is to extend join algorithm selection to be able to shuffle only > some of the inputs for a given join and avoid unnecessary shuffle operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (HIVE-17142) HIVE command to get the column count ?
[ https://issues.apache.org/jira/browse/HIVE-17142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayanthi R reopened HIVE-17142: --- > HIVE command to get the column count ? > -- > > Key: HIVE-17142 > URL: https://issues.apache.org/jira/browse/HIVE-17142 > Project: Hive > Issue Type: Wish >Reporter: Jayanthi R > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17037) Use 1-to-1 Tez edge to avoid unnecessary input data shuffle
[ https://issues.apache.org/jira/browse/HIVE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-17037: -- Labels: TODOC3.0 (was: ) > Use 1-to-1 Tez edge to avoid unnecessary input data shuffle > --- > > Key: HIVE-17037 > URL: https://issues.apache.org/jira/browse/HIVE-17037 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-17037.01.patch, HIVE-17037.02.patch, > HIVE-17037.03.patch, HIVE-17037.patch > > > As an example, consider the following query: > {code:sql} > SELECT * > FROM ( > SELECT a.value > FROM src1 a > JOIN src1 b > ON (a.value = b.value) > GROUP BY a.value > ) a > JOIN src > ON (a.value = src.value); > {code} > Currently, the plan generated for Tez will contain an unnecessary shuffle > operation between the subquery and the join, since the records produced by > the subquery are already sorted by the value. > This issue is to extend join algorithm selection to be able to shuffle only > some of the inputs for a given join and avoid unnecessary shuffle operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17141) HIVE command to get the column count ?
[ https://issues.apache.org/jira/browse/HIVE-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095816#comment-16095816 ] Jayanthi R commented on HIVE-17141: --- I want to count how many number of columns are there in my table. > HIVE command to get the column count ? > -- > > Key: HIVE-17141 > URL: https://issues.apache.org/jira/browse/HIVE-17141 > Project: Hive > Issue Type: Wish >Reporter: Jayanthi R > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-17142) HIVE command to get the column count ?
[ https://issues.apache.org/jira/browse/HIVE-17142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg resolved HIVE-17142. Resolution: Duplicate > HIVE command to get the column count ? > -- > > Key: HIVE-17142 > URL: https://issues.apache.org/jira/browse/HIVE-17142 > Project: Hive > Issue Type: Wish >Reporter: Jayanthi R > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17141) HIVE command to get the column count ?
[ https://issues.apache.org/jira/browse/HIVE-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095811#comment-16095811 ] Vineet Garg commented on HIVE-17141: [~jayanthir] You can use {code:sql} select count() from {code} to get column count. I am not sure what this JIRA is for. Could you please elaborate if you are facing any issue? If you have any question you can use d...@hive.apache.org mailing list. > HIVE command to get the column count ? > -- > > Key: HIVE-17141 > URL: https://issues.apache.org/jira/browse/HIVE-17141 > Project: Hive > Issue Type: Wish >Reporter: Jayanthi R > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16369) Vectorization: Support PTF (Part 1: No Custom Window Framing -- Default Only)
[ https://issues.apache.org/jira/browse/HIVE-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095798#comment-16095798 ] Lefty Leverenz commented on HIVE-16369: --- Doc note: This adds *hive.vectorized.execution.ptf.enabled* to HiveConf.java, so it will need to be documented in the wiki. * [Configuration Properties -- Vectorization | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Vectorization] Added a TODOC3.0 label. Acronym clarification: PTF means partitioned table function ... right? > Vectorization: Support PTF (Part 1: No Custom Window Framing -- Default Only) > - > > Key: HIVE-16369 > URL: https://issues.apache.org/jira/browse/HIVE-16369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-16369.01.patch, HIVE-16369.02.patch, > HIVE-16369.04.patch, HIVE-16369.05.patch.tar.gz, HIVE-16369.06.patch, > HIVE-16369.07.patch, HIVE-16369.091.patch, HIVE-16369.092.patch, > HIVE-16369.093.patch, HIVE-16369.094.patch, HIVE-16369.095.patch, > HIVE-16369.097.patch, HIVE-16369.098.patch, HIVE-16369.0991.patch, > HIVE-16369.0992.patch, HIVE-16369.0993.patch, HIVE-16369.0994.patch, > HIVE-16369.099.patch, HIVE-16369.09.patch > > > Vectorize a submit of current PTFOperator window function support. The first > phase doesn't include custom PRECEDING / FOLLOWING window frame clauses. > Since we don't have unbounded support yet (i.e. spilling to disk) the enable > variable hive.vectorized.execution.ptf.enabled is off by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095796#comment-16095796 ] Ke Jia commented on HIVE-17139: --- With this patch, I test "select case when a=1 then trim(b) end from test_orc_5000" in my development machine. The data scale is almost 50 million records in table test_orc_5000(a int, b string) stored as ORC. The execution engine is spark. I do three experiments and the average value is as below table. The result shows the execution time of spark from 35.76s to 32.57s, the time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then expression evaluation from 4735 to 5000712. || ||Non-optimization||Optimization||Improvement|| |Hos|35.76s|32.57s|8.9%| |VectorSelectOperator|3.12s|0.89s|7.15%| |count|4735|5000712|8.99%| > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13125) Support masking and filtering of rows/columns
[ https://issues.apache.org/jira/browse/HIVE-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13125: --- Attachment: ColumnMaskingInsertDesign.docx > Support masking and filtering of rows/columns > - > > Key: HIVE-13125 > URL: https://issues.apache.org/jira/browse/HIVE-13125 > Project: Hive > Issue Type: New Feature > Components: Security >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: ColumnMaskingInsertDesign.docx, HIVE-13125.01.patch, > HIVE-13125.02.patch, HIVE-13125.03.patch, HIVE-13125.04.patch, > HIVE-13125.final.patch > > > Traditionally, access control at the row and column level is implemented > through views. Using views as an access control method works well only when > access rules, restrictions, and conditions are monolithic and simple. It > however becomes ineffective when view definitions become too complex because > of the complexity and granularity of privacy and security policies. It also > becomes costly when a large number of views must be manually updated and > maintained. In addition, the ability to update views proves to be > challenging. As privacy and security policies evolve, required updates to > views may negatively affect the security logic particularly when database > applications reference the views directly by name. HIVE row and column access > control helps resolve all these problems. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16369) Vectorization: Support PTF (Part 1: No Custom Window Framing -- Default Only)
[ https://issues.apache.org/jira/browse/HIVE-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-16369: -- Labels: TODOC3.0 (was: ) > Vectorization: Support PTF (Part 1: No Custom Window Framing -- Default Only) > - > > Key: HIVE-16369 > URL: https://issues.apache.org/jira/browse/HIVE-16369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-16369.01.patch, HIVE-16369.02.patch, > HIVE-16369.04.patch, HIVE-16369.05.patch.tar.gz, HIVE-16369.06.patch, > HIVE-16369.07.patch, HIVE-16369.091.patch, HIVE-16369.092.patch, > HIVE-16369.093.patch, HIVE-16369.094.patch, HIVE-16369.095.patch, > HIVE-16369.097.patch, HIVE-16369.098.patch, HIVE-16369.0991.patch, > HIVE-16369.0992.patch, HIVE-16369.0993.patch, HIVE-16369.0994.patch, > HIVE-16369.099.patch, HIVE-16369.09.patch > > > Vectorize a submit of current PTFOperator window function support. The first > phase doesn't include custom PRECEDING / FOLLOWING window frame clauses. > Since we don't have unbounded support yet (i.e. spilling to disk) the enable > variable hive.vectorized.execution.ptf.enabled is off by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Attachment: HIVE-17139.1.patch > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia reassigned HIVE-17139: - > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095763#comment-16095763 ] PRASHANT GOLASH commented on HIVE-17117: [~mohitsabharwal], I have attached the latest patch. Please have a look and let me know the next steps. > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed
[ https://issues.apache.org/jira/browse/HIVE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095761#comment-16095761 ] Rui Li commented on HIVE-17114: --- Latest failures not related. > HoS: Possible skew in shuffling when data is not really skewed > -- > > Key: HIVE-17114 > URL: https://issues.apache.org/jira/browse/HIVE-17114 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17114.1.patch, HIVE-17114.2.patch, > HIVE-17114.3.patch > > > Observed in HoS and may apply to other engines as well. > When we join 2 tables on a single int key, we use the key itself as hash code > in {{ObjectInspectorUtils.hashCode}}: > {code} > case INT: > return ((IntObjectInspector) poi).get(o); > {code} > Suppose the keys are different but are all some multiples of 10. And if we > choose 10 as #reducers, the shuffle will be skewed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17034) The spark tar for itests is downloaded every time if md5sum is not installed
[ https://issues.apache.org/jira/browse/HIVE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095759#comment-16095759 ] Rui Li commented on HIVE-17034: --- Hi [~kgyrtkirk], the change here only affects the "re-download" logic: {code} if [[ ! -f $DOWNLOAD_DIR/$tarName ]] then curl -Sso $DOWNLOAD_DIR/$tarName $url else local md5File="$tarName".md5sum curl -Sso $DOWNLOAD_DIR/$md5File "$url".md5sum cd $DOWNLOAD_DIR if type md5sum >/dev/null ! md5sum -c $md5File; then curl -Sso $DOWNLOAD_DIR/$tarName $url || return 1 fi {code} If the tar doesn't exist in the first place, it'll be downloaded anyway. For "re-download", if the developer really cares about updating the spark tar, I assume he/she will be aware that md5sum is needed. Does that make sense? > The spark tar for itests is downloaded every time if md5sum is not installed > > > Key: HIVE-17034 > URL: https://issues.apache.org/jira/browse/HIVE-17034 > Project: Hive > Issue Type: Test > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17034.1.patch > > > I think we should either skip verifying md5, or fail the build to let > developer know md5sum is required. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095710#comment-16095710 ] Hive QA commented on HIVE-17087: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878290/HIVE-17087.1.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11095 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_row__id] (batchId=46) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=228) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6103/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6103/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6103/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12878290 - PreCommit-HIVE-Build > Remove unnecessary HoS DPP trees during map-join conversion > --- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1 where partitioned_table1.part_col in > (select regular_table.col from regular_table join partitioned_table2 on > regular_table.col = partitioned_table2.part_col); > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-4 depends on stages: Stage-2 > Stage-5 depends on stages: Stage-4 > Stage-3 depends on stages: Stage-5 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 4 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 >
[jira] [Assigned] (HIVE-10567) partial scan for rcfile table doesn't work for dynamic partition
[ https://issues.apache.org/jira/browse/HIVE-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-10567: -- Assignee: Bing Li (was: Thomas Friedrich) > partial scan for rcfile table doesn't work for dynamic partition > > > Key: HIVE-10567 > URL: https://issues.apache.org/jira/browse/HIVE-10567 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.14.0, 1.0.0 >Reporter: Thomas Friedrich >Assignee: Bing Li >Priority: Minor > Labels: rcfile > Attachments: HIVE-10567.1.patch > > > HIVE-3958 added support for partial scan for RCFile. This works fine for > static partitions (for example: analyze table analyze_srcpart_partial_scan > PARTITION(ds='2008-04-08',hr=11) compute statistics partialscan). > For dynamic partition, the analyze files with an IOException > "java.io.IOException: No input paths specified in job": > hive> ANALYZE TABLE testtable PARTITION(col_varchar) COMPUTE STATISTICS > PARTIALSCAN; > java.io.IOException: No input paths specified in job > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:459) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
[ https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095681#comment-16095681 ] Sahil Takiar edited comment on HIVE-17131 at 7/21/17 2:23 AM: -- Thanks for pointing that out. I can just move the changes for the Serializer and Deserializer interfaces into a separate patch that will only go into branch-2, does that sound reasonable? was (Author: stakiar): Thanks for pointing that out. I changed the target version for this to branch-2 only. > Add InterfaceAudience and InterfaceStability annotations for SerDe APIs > --- > > Key: HIVE-17131 > URL: https://issues.apache.org/jira/browse/HIVE-17131 > Project: Hive > Issue Type: Sub-task > Components: Serializers/Deserializers >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17131.1.patch > > > Adding InterfaceAudience and InterfaceStability annotations for the core > SerDe APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
[ https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095681#comment-16095681 ] Sahil Takiar edited comment on HIVE-17131 at 7/21/17 2:18 AM: -- Thanks for pointing that out. I changed the target version for this to branch-2 only. was (Author: stakiar): T > Add InterfaceAudience and InterfaceStability annotations for SerDe APIs > --- > > Key: HIVE-17131 > URL: https://issues.apache.org/jira/browse/HIVE-17131 > Project: Hive > Issue Type: Sub-task > Components: Serializers/Deserializers >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17131.1.patch > > > Adding InterfaceAudience and InterfaceStability annotations for the core > SerDe APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
[ https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095681#comment-16095681 ] Sahil Takiar commented on HIVE-17131: - T > Add InterfaceAudience and InterfaceStability annotations for SerDe APIs > --- > > Key: HIVE-17131 > URL: https://issues.apache.org/jira/browse/HIVE-17131 > Project: Hive > Issue Type: Sub-task > Components: Serializers/Deserializers >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17131.1.patch > > > Adding InterfaceAudience and InterfaceStability annotations for the core > SerDe APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
[ https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17131: Target Version/s: 2.4.0 (was: 3.0.0) > Add InterfaceAudience and InterfaceStability annotations for SerDe APIs > --- > > Key: HIVE-17131 > URL: https://issues.apache.org/jira/browse/HIVE-17131 > Project: Hive > Issue Type: Sub-task > Components: Serializers/Deserializers >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17131.1.patch > > > Adding InterfaceAudience and InterfaceStability annotations for the core > SerDe APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095680#comment-16095680 ] Hive QA commented on HIVE-16997: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878270/HIVE-16997.02.patch {color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10998 tests executed *Failed tests:* {noformat} TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge (batchId=206) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[1] (batchId=182) org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[0] (batchId=182) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6102/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6102/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6102/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12878270 - PreCommit-HIVE-Build > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17087: Summary: Remove unnecessary HoS DPP trees during map-join conversion (was: Remove HoS DPP tree during map-join conversion) > Remove unnecessary HoS DPP trees during map-join conversion > --- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1 where partitioned_table1.part_col in > (select regular_table.col from regular_table join partitioned_table2 on > regular_table.col = partitioned_table2.part_col); > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-4 depends on stages: Stage-2 > Stage-5 depends on stages: Stage-4 > Stage-3 depends on stages: Stage-5 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 4 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 3 > Stage: Stage-4 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: regular_table > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: col is not null (type: boolean) > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE
[jira] [Commented] (HIVE-17087) Remove HoS DPP tree during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095675#comment-16095675 ] Sahil Takiar commented on HIVE-17087: - Patch uploaded, will post some more details of the fix soon. > Remove HoS DPP tree during map-join conversion > -- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1 where partitioned_table1.part_col in > (select regular_table.col from regular_table join partitioned_table2 on > regular_table.col = partitioned_table2.part_col); > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-4 depends on stages: Stage-2 > Stage-5 depends on stages: Stage-4 > Stage-3 depends on stages: Stage-5 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 4 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 3 > Stage: Stage-4 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: regular_table > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: col is not null (type: boolean) > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning
[jira] [Updated] (HIVE-17087) Remove HoS DPP tree during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17087: Attachment: HIVE-17087.1.patch > Remove HoS DPP tree during map-join conversion > -- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1 where partitioned_table1.part_col in > (select regular_table.col from regular_table join partitioned_table2 on > regular_table.col = partitioned_table2.part_col); > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-4 depends on stages: Stage-2 > Stage-5 depends on stages: Stage-4 > Stage-3 depends on stages: Stage-5 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 4 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 3 > Stage: Stage-4 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: regular_table > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: col is not null (type: boolean) > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr:
[jira] [Updated] (HIVE-17087) Remove HoS DPP tree during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17087: Status: Patch Available (was: Open) > Remove HoS DPP tree during map-join conversion > -- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1 where partitioned_table1.part_col in > (select regular_table.col from regular_table join partitioned_table2 on > regular_table.col = partitioned_table2.part_col); > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-4 depends on stages: Stage-2 > Stage-5 depends on stages: Stage-4 > Stage-3 depends on stages: Stage-5 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 4 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 3 > Stage: Stage-4 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: regular_table > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: col is not null (type: boolean) > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key
[jira] [Updated] (HIVE-17087) Remove HoS DPP tree during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17087: Summary: Remove HoS DPP tree during map-join conversion (was: HoS Query with multiple Partition Pruning Sinks + subquery has incorrect explain) > Remove HoS DPP tree during map-join conversion > -- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1 where partitioned_table1.part_col in > (select regular_table.col from regular_table join partitioned_table2 on > regular_table.col = partitioned_table2.part_col); > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-4 depends on stages: Stage-2 > Stage-5 depends on stages: Stage-4 > Stage-3 depends on stages: Stage-5 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 4 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 3 > Stage: Stage-4 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: regular_table > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: col is not null (type: boolean) > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: > COMPLETE Column stats: NONE > Spark Partition
[jira] [Updated] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct
[ https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17116: Resolution: Fixed Status: Resolved (was: Patch Available) > Vectorization: Add infrastructure for vectorization of ROW__ID struct > - > > Key: HIVE-17116 > URL: https://issues.apache.org/jira/browse/HIVE-17116 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch > > > Supports new ACID work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct
[ https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095668#comment-16095668 ] Matt McCline commented on HIVE-17116: - In a subsequent JIRA, Teddy will actually make vectorized ROW__ID work by filling in that column with values -- probably from within the ACID ORC reader(s). See line 815 in VectorMapOperator for the relevant UNDONE. > Vectorization: Add infrastructure for vectorization of ROW__ID struct > - > > Key: HIVE-17116 > URL: https://issues.apache.org/jira/browse/HIVE-17116 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch > > > Supports new ACID work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct
[ https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17116: Fix Version/s: 3.0.0 > Vectorization: Add infrastructure for vectorization of ROW__ID struct > - > > Key: HIVE-17116 > URL: https://issues.apache.org/jira/browse/HIVE-17116 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch > > > Supports new ACID work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct
[ https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095667#comment-16095667 ] Matt McCline commented on HIVE-17116: - Thank you Teddy for your code review. Committed to master. > Vectorization: Add infrastructure for vectorization of ROW__ID struct > - > > Key: HIVE-17116 > URL: https://issues.apache.org/jira/browse/HIVE-17116 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch > > > Supports new ACID work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct
[ https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17116: Summary: Vectorization: Add infrastructure for vectorization of ROW__ID struct (was: Vectorization: Enable vectorization of ROW__ID struct) > Vectorization: Add infrastructure for vectorization of ROW__ID struct > - > > Key: HIVE-17116 > URL: https://issues.apache.org/jira/browse/HIVE-17116 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch > > > Supports new ACID work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed
[ https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095620#comment-16095620 ] Hive QA commented on HIVE-17128: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878269/HIVE-17128.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11092 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=228) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6101/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6101/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6101/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12878269 - PreCommit-HIVE-Build > Operation Logging leaks file descriptors as the log4j Appender is never closed > -- > > Key: HIVE-17128 > URL: https://issues.apache.org/jira/browse/HIVE-17128 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch > > > [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 > RoutingAppender to automatically output the log for each query into each > individual operation log file. As log4j does not know when a query is > finished it keeps the OutputStream in the Appender open even when the query > completes. The stream holds a file descriptor and so we leak file > descriptors. Note that we are already careful to close any streams reading > from the operation log file. > h2. Fix > To fix this we use a technique described in the comments of [LOG4J2-510] > which uses reflection to close the appender. The test in > TestOperationLoggingLayout will be extended to check that the Appender is > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095582#comment-16095582 ] Hive QA commented on HIVE-16077: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878267/HIVE-16077.02.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11094 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoinopt3] (batchId=21) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization_acid] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6100/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6100/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6100/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12878267 - PreCommit-HIVE-Build > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed
[ https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095528#comment-16095528 ] Andrew Sherman commented on HIVE-17128: --- Hi [~aihuaxu] can you review this change please? [There is a review board diff here|https://reviews.apache.org/r/61010/] Thanks > Operation Logging leaks file descriptors as the log4j Appender is never closed > -- > > Key: HIVE-17128 > URL: https://issues.apache.org/jira/browse/HIVE-17128 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch > > > [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 > RoutingAppender to automatically output the log for each query into each > individual operation log file. As log4j does not know when a query is > finished it keeps the OutputStream in the Appender open even when the query > completes. The stream holds a file descriptor and so we leak file > descriptors. Note that we are already careful to close any streams reading > from the operation log file. > h2. Fix > To fix this we use a technique described in the comments of [LOG4J2-510] > which uses reflection to close the appender. The test in > TestOperationLoggingLayout will be extended to check that the Appender is > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16997: --- Status: Patch Available (was: Open) > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16997: --- Status: Open (was: Patch Available) > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16997: --- Attachment: HIVE-16997.02.patch > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed
[ https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Sherman updated HIVE-17128: -- Attachment: HIVE-17128.2.patch > Operation Logging leaks file descriptors as the log4j Appender is never closed > -- > > Key: HIVE-17128 > URL: https://issues.apache.org/jira/browse/HIVE-17128 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch > > > [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 > RoutingAppender to automatically output the log for each query into each > individual operation log file. As log4j does not know when a query is > finished it keeps the OutputStream in the Appender open even when the query > completes. The stream holds a file descriptor and so we leak file > descriptors. Note that we are already careful to close any streams reading > from the operation log file. > h2. Fix > To fix this we use a technique described in the comments of [LOG4J2-510] > which uses reflection to close the appender. The test in > TestOperationLoggingLayout will be extended to check that the Appender is > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Attachment: HIVE-16077.02.patch > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17085) ORC file merge/concatenation should do full schema check
[ https://issues.apache.org/jira/browse/HIVE-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17085: - Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) Test failures are unrelated to this patch. Committed to branch-2 and master. Thanks Zoltan for the review! > ORC file merge/concatenation should do full schema check > > > Key: HIVE-17085 > URL: https://issues.apache.org/jira/browse/HIVE-17085 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0, 2.3.0, 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-17085.1.patch, HIVE-17085.2.patch > > > ORC merging/concatenation compatibility check just looks for column count > match at outer level. ORC schema evolution now supports inner structs as > well. With that outer level column count will match but inner column level > will not match. Compatibility check should do full schema match before > merging/concatenation. This issue will not cause data loss but will cause > task failures with exception like below > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to close > OrcFileMergeOperator > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:247) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:212) > ... 16 more > Caused by: java.lang.IllegalArgumentException: Column has wrong number of > index entries found: 0 expected: 1 > at > org.apache.orc.impl.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:695) > at > org.apache.orc.impl.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:2147) > at org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:2661) > at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2834) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:321) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:243) > ... 19 more > {code} > Concatenation should also make sure writer version is matching (it currently > checks only file version match). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
[ https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095414#comment-16095414 ] Ashutosh Chauhan commented on HIVE-17131: - I think we shall do HIVE-16374 instead for master. > Add InterfaceAudience and InterfaceStability annotations for SerDe APIs > --- > > Key: HIVE-17131 > URL: https://issues.apache.org/jira/browse/HIVE-17131 > Project: Hive > Issue Type: Sub-task > Components: Serializers/Deserializers >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17131.1.patch > > > Adding InterfaceAudience and InterfaceStability annotations for the core > SerDe APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14181) DROP TABLE in hive doesn't Throw Error
[ https://issues.apache.org/jira/browse/HIVE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095376#comment-16095376 ] Aihua Xu commented on HIVE-14181: - [~szita] That is a challenge to keep HDFS async with metadata during insertion and deletion. Agree that we can print some kind of warning to the user at least and let the user clean up the data manually. Another thought: when HDFS trash is turned off, seems throwing the warning is what we can do; when HDFS trash is turned on, since we can recover HDFS files, then we can keep HDFS and metadata async by recovering HDFS file and reverting metadata change if anything fails. How do you think? > DROP TABLE in hive doesn't Throw Error > -- > > Key: HIVE-14181 > URL: https://issues.apache.org/jira/browse/HIVE-14181 > Project: Hive > Issue Type: Bug > Environment: Hive 1.1.0 > CDH 5.5.1-1 >Reporter: Pranjal Singh >Assignee: Adam Szita > Labels: easyfix > Attachments: HIVE-14181.1.patch, HIVE-14181.2.patch > > > drop table table_name doen't throw an error if drop table fails. > I was dropping a table and my trash didn't have enough space to hold the > table but the drop table command showed success and the table wasn't deleted. > But the hadoop fs -rm -r /hive/xyz.db/table_name/ gave an error "Failed to > move to trash" because I didnot have enough space quota in my trash. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17138) FileSinkOperator doesn't create empty files for acid path
[ https://issues.apache.org/jira/browse/HIVE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17138: -- Description: For bucketed tables, FileSinkOperator is expected (in some cases) to produce a specific number of files even if they are empty. FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty. This doesn't property work for Acid path. For Insert, the OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the actual bucketN file (as of HIVE-14007, it does it regardless of whether RecordUpdater sees any rows). This causes empty (i.e.ORC metadata only) bucket files to be created for multiFileSpray=true if a particular FileSinkOperator.process() sees at least 1 row. For example, {noformat} create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored as orc TBLPROPERTIES ('transactional'='true'); insert into fourbuckets values(0,1),(1,1); with mapreduce.job.reduces = 1 or 2 {noformat} For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that needs to land there is seen. Thus it never creates empty buckets no mater what the value of _skipFiles_ in closeOp(boolean). Once Split Update does the split early (in operator pipeline) only the Insert path will matter since base and delta are the only files split computation, etc looks at. delete_delta is only for Acid internals so there is never any reason for create empty files there. was: For bucketed tables, FileSinkOperator is expected (in some cases) to produce a specific number of files even if they are empty. FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty. This doesn't property work for Acid path. For Insert, the OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the actual bucketN file (as of HIVE-14007, it does it regardless of whether RecordUpdate sees any rows). This causes empty (i.e.ORC metadata only) bucket files to be created. For example, {noformat} create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored as orc TBLPROPERTIES ('transactional'='true'); insert into fourbuckets values(0,1),(1,1); {noformat} For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that needs to land there is seen. Thus it never creates empty buckets no mater what the value of _skipFiles_ in closeOp(boolean). Once Split Update does the split early (in operator pipeline) only the Insert path will matter since base and delta are the only files split computation, etc looks at. delete_delta is only for Acid internals so there is never any reason for create empty files there. > FileSinkOperator doesn't create empty files for acid path > - > > Key: HIVE-17138 > URL: https://issues.apache.org/jira/browse/HIVE-17138 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > For bucketed tables, FileSinkOperator is expected (in some cases) to produce > a specific number of files even if they are empty. > FileSinkOperator.closeOp(boolean abort) has logic to create files even if > empty. > This doesn't property work for Acid path. For Insert, the > OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the > actual bucketN file (as of HIVE-14007, it does it regardless of whether > RecordUpdater sees any rows). This causes empty (i.e.ORC metadata only) > bucket files to be created for multiFileSpray=true if a particular > FileSinkOperator.process() sees at least 1 row. For example, > {noformat} > create table fourbuckets (a int, b int) clustered by (a) into 4 buckets > stored as orc TBLPROPERTIES ('transactional'='true'); > insert into fourbuckets values(0,1),(1,1); > with mapreduce.job.reduces = 1 or 2 > {noformat} > For Update/Delete path, OrcRecordWriter is created lazily when the 1st row > that needs to land there is seen. Thus it never creates empty buckets no > mater what the value of _skipFiles_ in closeOp(boolean). > Once Split Update does the split early (in operator pipeline) only the Insert > path will matter since base and delta are the only files split computation, > etc looks at. delete_delta is only for Acid internals so there is never any > reason for create empty files there. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17138) FileSinkOperator doesn't create empty files for acid path
[ https://issues.apache.org/jira/browse/HIVE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-17138: - > FileSinkOperator doesn't create empty files for acid path > - > > Key: HIVE-17138 > URL: https://issues.apache.org/jira/browse/HIVE-17138 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > For bucketed tables, FileSinkOperator is expected (in some cases) to produce > a specific number of files even if they are empty. > FileSinkOperator.closeOp(boolean abort) has logic to create files even if > empty. > This doesn't property work for Acid path. For Insert, the > OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the > actual bucketN file (as of HIVE-14007, it does it regardless of whether > RecordUpdate sees any rows). This causes empty (i.e.ORC metadata only) > bucket files to be created. For example, > {noformat} > create table fourbuckets (a int, b int) clustered by (a) into 4 buckets > stored as orc TBLPROPERTIES ('transactional'='true'); > insert into fourbuckets values(0,1),(1,1); > {noformat} > For Update/Delete path, OrcRecordWriter is created lazily when the 1st row > that needs to land there is seen. Thus it never creates empty buckets no > mater what the value of _skipFiles_ in closeOp(boolean). > Once Split Update does the split early (in operator pipeline) only the Insert > path will matter since base and delta are the only files split computation, > etc looks at. delete_delta is only for Acid internals so there is never any > reason for create empty files there. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed
[ https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095297#comment-16095297 ] Hive QA commented on HIVE-17128: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878232/HIVE-17128.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11088 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6099/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6099/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6099/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12878232 - PreCommit-HIVE-Build > Operation Logging leaks file descriptors as the log4j Appender is never closed > -- > > Key: HIVE-17128 > URL: https://issues.apache.org/jira/browse/HIVE-17128 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Attachments: HIVE-17128.1.patch > > > [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 > RoutingAppender to automatically output the log for each query into each > individual operation log file. As log4j does not know when a query is > finished it keeps the OutputStream in the Appender open even when the query > completes. The stream holds a file descriptor and so we leak file > descriptors. Note that we are already careful to close any streams reading > from the operation log file. > h2. Fix > To fix this we use a technique described in the comments of [LOG4J2-510] > which uses reflection to close the appender. The test in > TestOperationLoggingLayout will be extended to check that the Appender is > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16787) Fix itests in branch-2.2
[ https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-16787: - Release Note: (was: I just committed this. Thanks for the review, Alan. ) I just committed this. Thanks for the review, Alan. > Fix itests in branch-2.2 > > > Key: HIVE-16787 > URL: https://issues.apache.org/jira/browse/HIVE-16787 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > > The itests are broken in branch 2.2 and need to be fixed before release. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16787) Fix itests in branch-2.2
[ https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HIVE-16787. -- Resolution: Fixed > Fix itests in branch-2.2 > > > Key: HIVE-16787 > URL: https://issues.apache.org/jira/browse/HIVE-16787 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > > The itests are broken in branch 2.2 and need to be fixed before release. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (HIVE-16787) Fix itests in branch-2.2
[ https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reopened HIVE-16787: -- > Fix itests in branch-2.2 > > > Key: HIVE-16787 > URL: https://issues.apache.org/jira/browse/HIVE-16787 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > > The itests are broken in branch 2.2 and need to be fixed before release. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16787) Fix itests in branch-2.2
[ https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HIVE-16787. -- Resolution: Fixed Release Note: I just committed this. Thanks for the review, Alan. > Fix itests in branch-2.2 > > > Key: HIVE-16787 > URL: https://issues.apache.org/jira/browse/HIVE-16787 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > > The itests are broken in branch 2.2 and need to be fixed before release. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16787) Fix itests in branch-2.2
[ https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095286#comment-16095286 ] ASF GitHub Bot commented on HIVE-16787: --- Github user asfgit closed the pull request at: https://github.com/apache/hive/pull/207 > Fix itests in branch-2.2 > > > Key: HIVE-16787 > URL: https://issues.apache.org/jira/browse/HIVE-16787 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > > The itests are broken in branch 2.2 and need to be fixed before release. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16366) Hive 2.3 release planning
[ https://issues.apache.org/jira/browse/HIVE-16366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16366: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Hive 2.3 release planning > - > > Key: HIVE-16366 > URL: https://issues.apache.org/jira/browse/HIVE-16366 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Blocker > Labels: 2.3.0 > Fix For: 2.3.0 > > Attachments: HIVE-16366-branch-2.3.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14181) DROP TABLE in hive doesn't Throw Error
[ https://issues.apache.org/jira/browse/HIVE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095257#comment-16095257 ] Adam Szita commented on HIVE-14181: --- Hi [~daijy], we've taken a look on this [~aihuaxu] and [~vihangk1] a while back but couldn't reach consensus. Please take a look on the RB link too. I agree that rolling back (as the latest patch does) the HMS transaction is a bad idea because the data deletion may fail partially, leaving an intact metadata and corrupted data - that's a bad combo.. However I still think that we should at least throw the exception back to the user to indicate that something went south while actually deleting the data, and not just leave a warning in the log of HMS.. > DROP TABLE in hive doesn't Throw Error > -- > > Key: HIVE-14181 > URL: https://issues.apache.org/jira/browse/HIVE-14181 > Project: Hive > Issue Type: Bug > Environment: Hive 1.1.0 > CDH 5.5.1-1 >Reporter: Pranjal Singh >Assignee: Adam Szita > Labels: easyfix > Attachments: HIVE-14181.1.patch, HIVE-14181.2.patch > > > drop table table_name doen't throw an error if drop table fails. > I was dropping a table and my trash didn't have enough space to hold the > table but the drop table command showed success and the table wasn't deleted. > But the hadoop fs -rm -r /hive/xyz.db/table_name/ gave an error "Failed to > move to trash" because I didnot have enough space quota in my trash. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17137) Fix javolution conflict
[ https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095207#comment-16095207 ] Hive QA commented on HIVE-17137: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878218/HIVE-17137.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11088 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6097/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6097/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6097/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12878218 - PreCommit-HIVE-Build > Fix javolution conflict > --- > > Key: HIVE-17137 > URL: https://issues.apache.org/jira/browse/HIVE-17137 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-17137.01.patch > > > as reported by [~jcamachorodriguez] > {code} > [WARNING] Some problems were encountered while building the effective model > for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT > [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must > be unique: javolution:javolution:jar -> duplicate declaration of version > ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], > /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, > column 17 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed
[ https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Sherman updated HIVE-17128: -- Attachment: HIVE-17128.1.patch > Operation Logging leaks file descriptors as the log4j Appender is never closed > -- > > Key: HIVE-17128 > URL: https://issues.apache.org/jira/browse/HIVE-17128 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Attachments: HIVE-17128.1.patch > > > [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 > RoutingAppender to automatically output the log for each query into each > individual operation log file. As log4j does not know when a query is > finished it keeps the OutputStream in the Appender open even when the query > completes. The stream holds a file descriptor and so we leak file > descriptors. Note that we are already careful to close any streams reading > from the operation log file. > h2. Fix > To fix this we use a technique described in the comments of [LOG4J2-510] > which uses reflection to close the appender. The test in > TestOperationLoggingLayout will be extended to check that the Appender is > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed
[ https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Sherman updated HIVE-17128: -- Status: Patch Available (was: Open) > Operation Logging leaks file descriptors as the log4j Appender is never closed > -- > > Key: HIVE-17128 > URL: https://issues.apache.org/jira/browse/HIVE-17128 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Andrew Sherman >Assignee: Andrew Sherman > Attachments: HIVE-17128.1.patch > > > [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 > RoutingAppender to automatically output the log for each query into each > individual operation log file. As log4j does not know when a query is > finished it keeps the OutputStream in the Appender open even when the query > completes. The stream holds a file descriptor and so we leak file > descriptors. Note that we are already careful to close any streams reading > from the operation log file. > h2. Fix > To fix this we use a technique described in the comments of [LOG4J2-510] > which uses reflection to close the appender. The test in > TestOperationLoggingLayout will be extended to check that the Appender is > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-17117 started by PRASHANT GOLASH. -- > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PRASHANT GOLASH updated HIVE-17117: --- Status: In Progress (was: Patch Available) > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work stopped] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-17117 stopped by PRASHANT GOLASH. -- > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17083) DagUtils overwrites any credentials already added
[ https://issues.apache.org/jira/browse/HIVE-17083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095176#comment-16095176 ] Josh Elser commented on HIVE-17083: --- Thanks here as well, Sushanth! > DagUtils overwrites any credentials already added > - > > Key: HIVE-17083 > URL: https://issues.apache.org/jira/browse/HIVE-17083 > Project: Hive > Issue Type: Bug > Components: Tez >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 3.0.0 > > Attachments: HIVE-17083.patch > > > While working with a StorageHandler with hive.execution.engine=tez, I found > that the credentials the storage handler was adding were not propagating to > the dag. > After a big of debugging/git-log, I found that DagUtils was overwriting the > credentials which were already set. A quick patch locally seem to make things > work again. Will put together a quick unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095172#comment-16095172 ] PRASHANT GOLASH commented on HIVE-17117: Attached the latest patch. Thanks [~mohitsabharwal] & [~csun] > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PRASHANT GOLASH updated HIVE-17117: --- Attachment: HIVE-17117.1.patch > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PRASHANT GOLASH updated HIVE-17117: --- Attachment: (was: HIVE-17117.patch) > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PRASHANT GOLASH updated HIVE-17117: --- Attachment: HIVE-17117.patch > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17130) Add automated tests to check backwards compatibility of core APIs
[ https://issues.apache.org/jira/browse/HIVE-17130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095130#comment-16095130 ] Sahil Takiar edited comment on HIVE-17130 at 7/20/17 6:12 PM: -- Here are the relevant JIRAs from other Apache Projects who have done similar things: HADOOP-13583 HBASE-12808 and HBASE-18020 KUDU-1265 SPARK-1094 (Spark uses a Scala specific tool called [MiMa|https://github.com/typesafehub/migration-manager]) was (Author: stakiar): Here are the relevant JIRAs from other Apache Projects who have done similar things: HADOOP-13583 HBASE-12808 and HBASE-18020 KUDU-1265 > Add automated tests to check backwards compatibility of core APIs > - > > Key: HIVE-17130 > URL: https://issues.apache.org/jira/browse/HIVE-17130 > Project: Hive > Issue Type: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > We should added automated tests that check we are not adding backwards > incompatible changes to core APIs (e.g. HMS APIs, SerDe APIs, UDF APIs, etc.). > Other Apache components, such as HBase and Hadoop already have existing > checks. They are largely based on the japi-compliance-checker: > https://lvc.github.io/japi-compliance-checker/ > The nice thing about the japi-compliance-checker is that it can identify an > interface as "any class with a specified Java annotation", so we can use the > compliance-checker to check for backwards compatibility of any classes > annotated with InterfaceAudience.Public > Ideally, we can build this check into our pre-commit job, or get it into > YETUS, since we are already working on adding YETUS support to Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
[ https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095138#comment-16095138 ] Sahil Takiar commented on HIVE-17131: - [~ashutoshc], [~sershe] saw you both did some work on SerDe APIs in HIVE-15167 and HIVE-4007, so wanted to see if either of you had any thoughts or objections to this. > Add InterfaceAudience and InterfaceStability annotations for SerDe APIs > --- > > Key: HIVE-17131 > URL: https://issues.apache.org/jira/browse/HIVE-17131 > Project: Hive > Issue Type: Sub-task > Components: Serializers/Deserializers >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17131.1.patch > > > Adding InterfaceAudience and InterfaceStability annotations for the core > SerDe APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17130) Add automated tests to check backwards compatibility of core APIs
[ https://issues.apache.org/jira/browse/HIVE-17130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095130#comment-16095130 ] Sahil Takiar commented on HIVE-17130: - Here are the relevant JIRAs from other Apache Projects who have done similar things: HADOOP-13583 HBASE-12808 and HBASE-18020 KUDU-1265 > Add automated tests to check backwards compatibility of core APIs > - > > Key: HIVE-17130 > URL: https://issues.apache.org/jira/browse/HIVE-17130 > Project: Hive > Issue Type: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > We should added automated tests that check we are not adding backwards > incompatible changes to core APIs (e.g. HMS APIs, SerDe APIs, UDF APIs, etc.). > Other Apache components, such as HBase and Hadoop already have existing > checks. They are largely based on the japi-compliance-checker: > https://lvc.github.io/japi-compliance-checker/ > The nice thing about the japi-compliance-checker is that it can identify an > interface as "any class with a specified Java annotation", so we can use the > compliance-checker to check for backwards compatibility of any classes > annotated with InterfaceAudience.Public > Ideally, we can build this check into our pre-commit job, or get it into > YETUS, since we are already working on adding YETUS support to Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095091#comment-16095091 ] Mohit Sabharwal commented on HIVE-17117: looks like the patch attached needs to be refreshed. Not the same as one on RB. > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095073#comment-16095073 ] Chao Sun commented on HIVE-17117: - +1 > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17137) Fix javolution conflict
[ https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095071#comment-16095071 ] Jesus Camacho Rodriguez commented on HIVE-17137: +1 > Fix javolution conflict > --- > > Key: HIVE-17137 > URL: https://issues.apache.org/jira/browse/HIVE-17137 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-17137.01.patch > > > as reported by [~jcamachorodriguez] > {code} > [WARNING] Some problems were encountered while building the effective model > for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT > [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must > be unique: javolution:javolution:jar -> duplicate declaration of version > ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], > /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, > column 17 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17137) Fix javolution conflict
[ https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-17137: --- Status: Patch Available (was: Open) > Fix javolution conflict > --- > > Key: HIVE-17137 > URL: https://issues.apache.org/jira/browse/HIVE-17137 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-17137.01.patch > > > as reported by [~jcamachorodriguez] > {code} > [WARNING] Some problems were encountered while building the effective model > for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT > [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must > be unique: javolution:javolution:jar -> duplicate declaration of version > ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], > /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, > column 17 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17137) Fix javolution conflict
[ https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-17137: --- Attachment: HIVE-17137.01.patch > Fix javolution conflict > --- > > Key: HIVE-17137 > URL: https://issues.apache.org/jira/browse/HIVE-17137 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-17137.01.patch > > > as reported by [~jcamachorodriguez] > {code} > [WARNING] Some problems were encountered while building the effective model > for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT > [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must > be unique: javolution:javolution:jar -> duplicate declaration of version > ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], > /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, > column 17 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17137) Fix javolution conflict
[ https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095065#comment-16095065 ] Pengcheng Xiong commented on HIVE-17137: [~jcamachorodriguez], could u review the patch? thanks! > Fix javolution conflict > --- > > Key: HIVE-17137 > URL: https://issues.apache.org/jira/browse/HIVE-17137 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-17137.01.patch > > > as reported by [~jcamachorodriguez] > {code} > [WARNING] Some problems were encountered while building the effective model > for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT > [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must > be unique: javolution:javolution:jar -> duplicate declaration of version > ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], > /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, > column 17 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17137) Fix javolution conflict
[ https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reassigned HIVE-17137: -- > Fix javolution conflict > --- > > Key: HIVE-17137 > URL: https://issues.apache.org/jira/browse/HIVE-17137 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > > as reported by [~jcamachorodriguez] > {code} > [WARNING] Some problems were encountered while building the effective model > for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT > [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must > be unique: javolution:javolution:jar -> duplicate declaration of version > ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], > /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, > column 17 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095044#comment-16095044 ] Pengcheng Xiong commented on HIVE-16996: yes, i also saw that just now, I think it is due to my problem. I will take a look and put a patch there. thanks for discovering this! > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch, > HIVE-16966.06.patch, HIVE-16966.07.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17085) ORC file merge/concatenation should do full schema check
[ https://issues.apache.org/jira/browse/HIVE-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095019#comment-16095019 ] Zoltan Haindrich commented on HIVE-17085: - +1 > ORC file merge/concatenation should do full schema check > > > Key: HIVE-17085 > URL: https://issues.apache.org/jira/browse/HIVE-17085 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0, 2.3.0, 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17085.1.patch, HIVE-17085.2.patch > > > ORC merging/concatenation compatibility check just looks for column count > match at outer level. ORC schema evolution now supports inner structs as > well. With that outer level column count will match but inner column level > will not match. Compatibility check should do full schema match before > merging/concatenation. This issue will not cause data loss but will cause > task failures with exception like below > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to close > OrcFileMergeOperator > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:247) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:212) > ... 16 more > Caused by: java.lang.IllegalArgumentException: Column has wrong number of > index entries found: 0 expected: 1 > at > org.apache.orc.impl.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:695) > at > org.apache.orc.impl.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:2147) > at org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:2661) > at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2834) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:321) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:243) > ... 19 more > {code} > Concatenation should also make sure writer version is matching (it currently > checks only file version match). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS
[ https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-17001: --- Status: Open (was: Patch Available) Cancelling the patch as after some discussions it was decided that this should not be an issue. Data in the directory could be copied there on purpose by the user and should not be deleted without a warning. > Insert overwrite table doesn't clean partition directory on HDFS if partition > is missing from HMS > - > > Key: HIVE-17001 > URL: https://issues.apache.org/jira/browse/HIVE-17001 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17001.01.patch > > > Insert overwrite table should clear existing data before creating the new > data files. > For a partitioned table we will clean any folder of existing partitions on > HDFS, however if the partition folder exists only on HDFS and the partition > definition is missing in HMS, the folder is not cleared. > Reproduction steps: > 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string); > 2. INSERT INTO test PARTITION(ds='p1') values ('a'); > 3. Copy the data to a different folder with different name. > 4. ALTER TABLE test DROP PARTITION (ds='p1'); > 5. Recreate the partition directory, copy and rename the data file back > 6. INSERT OVERWRITE TABLE test PARTITION(ds='p1') values ('b'); > 7. SELECT * from test; > will result in 2 records being returned instead of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17085) ORC file merge/concatenation should do full schema check
[ https://issues.apache.org/jira/browse/HIVE-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094937#comment-16094937 ] Prasanth Jayachandran commented on HIVE-17085: -- [~gopalv] Can you please review this patch? > ORC file merge/concatenation should do full schema check > > > Key: HIVE-17085 > URL: https://issues.apache.org/jira/browse/HIVE-17085 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0, 2.3.0, 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17085.1.patch, HIVE-17085.2.patch > > > ORC merging/concatenation compatibility check just looks for column count > match at outer level. ORC schema evolution now supports inner structs as > well. With that outer level column count will match but inner column level > will not match. Compatibility check should do full schema match before > merging/concatenation. This issue will not cause data loss but will cause > task failures with exception like below > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to close > OrcFileMergeOperator > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:247) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:212) > ... 16 more > Caused by: java.lang.IllegalArgumentException: Column has wrong number of > index entries found: 0 expected: 1 > at > org.apache.orc.impl.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:695) > at > org.apache.orc.impl.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:2147) > at org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:2661) > at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2834) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:321) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:243) > ... 19 more > {code} > Concatenation should also make sure writer version is matching (it currently > checks only file version match). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17135) Bad error messages when beeline connects to unreachable hosts using binary and SSL
[ https://issues.apache.org/jira/browse/HIVE-17135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094887#comment-16094887 ] Carter Shanklin commented on HIVE-17135: One more problem: Connection Refused is also not reported properly when using binary and SSL Compare: Error: Could not open client transport with JDBC Uri: jdbc:hive2://hdp261.example.com:10003/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef: Could not connect to hdp261.example.com on port 10003 (state=08S01,code=0) Versus: Error: Could not open client transport with JDBC Uri: jdbc:hive2://hdp261.example.com:10003/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;transportMode=http;httpPath=cliservice: Could not establish connection to jdbc:hive2://hdp261.example.com:10003/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;transportMode=http;httpPath=cliservice: org.apache.http.conn.HttpHostConnectException: Connect to hdp261.example.com:10003 [hdp261.example.com/192.168.59.21] failed: Connection refused (Connection refused) (state=08S01,code=0) > Bad error messages when beeline connects to unreachable hosts using binary > and SSL > -- > > Key: HIVE-17135 > URL: https://issues.apache.org/jira/browse/HIVE-17135 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Carter Shanklin > Attachments: Screen Shot 2017-07-20 at 08.42.07.png > > > When you attempt to connect beeline to an unreachable host using both binary > transport and SSL you get a generic / unhelpful error message. > If you use HTTP or you don't use SSL (binary or HTTP) you get a descriptive > error message. > "Network is unreachable" <- for unroutable destinations > "Connection timed out" <- for hosts that fail to respond for whatever reason. > See attached image for the matrix. > It would be better if binary+SSL gave the same descriptive errors -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17136) Unhelpful beeline error message when you attempt to connect to HTTP HS2 using binary with SSL enabled
[ https://issues.apache.org/jira/browse/HIVE-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carter Shanklin updated HIVE-17136: --- Description: In this case the error message is "Invalid status 72". Full error: Error: Could not open client transport with JDBC Uri: jdbc:hive2://hdp261.example.com:10001/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice: Invalid status 72 (state=08S01,code=0) In my environment the connection works if I add transportMode=http. Compare this error to the error you get if you try to connect to something that is not HiveServer2 like SSH: Error: Could not open client transport with JDBC Uri: jdbc:hive2://hdp261.example.com:22/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? (state=08S01,code=0) If you got a similar error when you connect to HS2 it would be a lot easier to diagnose. was: In this case the error message is "Invalid status 72". Full error: Error: Could not open client transport with JDBC Uri: jdbc:hive2://hdp261.example.com:10001/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice: Invalid status 72 (state=08S01,code=0) In my environment the connection works if I add transportMode=http. Compare this error to the error you get if you try to connect to something that is not HiveServer2 like SSH: Error: Could not open client transport with JDBC Uri: jdbc:hive2://hdp261.example.com:22/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? (state=08S01,code=0) If you got this error when you connect to HS2 it would be a lot easier to diagnose. > Unhelpful beeline error message when you attempt to connect to HTTP HS2 using > binary with SSL enabled > - > > Key: HIVE-17136 > URL: https://issues.apache.org/jira/browse/HIVE-17136 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Carter Shanklin > > In this case the error message is "Invalid status 72". > Full error: > Error: Could not open client transport with JDBC Uri: > jdbc:hive2://hdp261.example.com:10001/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice: > Invalid status 72 (state=08S01,code=0) > In my environment the connection works if I add transportMode=http. > Compare this error to the error you get if you try to connect to something > that is not HiveServer2 like SSH: > Error: Could not open client transport with JDBC Uri: > jdbc:hive2://hdp261.example.com:22/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice: > javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? > (state=08S01,code=0) > If you got a similar error when you connect to HS2 it would be a lot easier > to diagnose. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17136) Unhelpful beeline error message when you attempt to connect to HTTP HS2 using binary with SSL enabled
[ https://issues.apache.org/jira/browse/HIVE-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carter Shanklin updated HIVE-17136: --- Component/s: Beeline > Unhelpful beeline error message when you attempt to connect to HTTP HS2 using > binary with SSL enabled > - > > Key: HIVE-17136 > URL: https://issues.apache.org/jira/browse/HIVE-17136 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Carter Shanklin > > In this case the error message is "Invalid status 72". > Full error: > Error: Could not open client transport with JDBC Uri: > jdbc:hive2://hdp261.example.com:10001/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice: > Invalid status 72 (state=08S01,code=0) > In my environment the connection works if I add transportMode=http. > Compare this error to the error you get if you try to connect to something > that is not HiveServer2 like SSH: > Error: Could not open client transport with JDBC Uri: > jdbc:hive2://hdp261.example.com:22/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice: > javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? > (state=08S01,code=0) > If you got this error when you connect to HS2 it would be a lot easier to > diagnose. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17135) Bad error messages when beeline connects to unreachable hosts using binary and SSL
[ https://issues.apache.org/jira/browse/HIVE-17135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carter Shanklin updated HIVE-17135: --- Attachment: Screen Shot 2017-07-20 at 08.42.07.png > Bad error messages when beeline connects to unreachable hosts using binary > and SSL > -- > > Key: HIVE-17135 > URL: https://issues.apache.org/jira/browse/HIVE-17135 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Carter Shanklin > Attachments: Screen Shot 2017-07-20 at 08.42.07.png > > > When you attempt to connect beeline to an unreachable host using both binary > transport and SSL you get a generic / unhelpful error message. > If you use HTTP or you don't use SSL (binary or HTTP) you get a descriptive > error message. > "Network is unreachable" <- for unroutable destinations > "Connection timed out" <- for hosts that fail to respond for whatever reason. > See attached image for the matrix. > It would be better if binary+SSL gave the same descriptive errors -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17090) spark.only.query.files are not being run by ptest
[ https://issues.apache.org/jira/browse/HIVE-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17090: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > spark.only.query.files are not being run by ptest > - > > Key: HIVE-17090 > URL: https://issues.apache.org/jira/browse/HIVE-17090 > Project: Hive > Issue Type: Bug > Components: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 3.0.0 > > Attachments: HIVE-17090.patch > > > Checked a recent run of Hive QA and it doesn't look like qtests specified in > spark.only.query.files are being run. > I think some modifications to ptest config files are required to get this > working - e.g. the deployed master-m2.properties file for ptest should > contain mainProperties.$\{spark.only.query.files} in the > qFileTest.miniSparkOnYarn.groups.normal key. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17090) spark.only.query.files are not being run by ptest
[ https://issues.apache.org/jira/browse/HIVE-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094850#comment-16094850 ] Sahil Takiar commented on HIVE-17090: - Committed to master, thanks Sergio! > spark.only.query.files are not being run by ptest > - > > Key: HIVE-17090 > URL: https://issues.apache.org/jira/browse/HIVE-17090 > Project: Hive > Issue Type: Bug > Components: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 3.0.0 > > Attachments: HIVE-17090.patch > > > Checked a recent run of Hive QA and it doesn't look like qtests specified in > spark.only.query.files are being run. > I think some modifications to ptest config files are required to get this > working - e.g. the deployed master-m2.properties file for ptest should > contain mainProperties.$\{spark.only.query.files} in the > qFileTest.miniSparkOnYarn.groups.normal key. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17122) spark_vectorized_dynamic_partition_pruning.q is continuously failing
[ https://issues.apache.org/jira/browse/HIVE-17122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094828#comment-16094828 ] Vihang Karajgaonkar commented on HIVE-17122: I will spend some time today to look into this and update if I find anything. > spark_vectorized_dynamic_partition_pruning.q is continuously failing > > > Key: HIVE-17122 > URL: https://issues.apache.org/jira/browse/HIVE-17122 > Project: Hive > Issue Type: Bug >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > {code} > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators: 1 > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:616) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.closeRecordProcessor(HiveReduceFunctionResultList.java:67) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:96) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1037) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.forwardBatch(SparkReduceRecordHandler.java:542) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:584) > ... 11 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17122) spark_vectorized_dynamic_partition_pruning.q is continuously failing
[ https://issues.apache.org/jira/browse/HIVE-17122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094823#comment-16094823 ] Sahil Takiar commented on HIVE-17122: - Thanks for pointing that out [~kellyzly]. Sounds we have been hitting this issue for a while. I tried debugging the code a bit, and didn't find anything obvious. Will update this JIRA if I find something. CC: [~vihangk1] if you have any idea on this, let us know. > spark_vectorized_dynamic_partition_pruning.q is continuously failing > > > Key: HIVE-17122 > URL: https://issues.apache.org/jira/browse/HIVE-17122 > Project: Hive > Issue Type: Bug >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > {code} > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > Caused by: java.lang.RuntimeException: Hive Runtime Error while closing > operators: 1 > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:616) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.closeRecordProcessor(HiveReduceFunctionResultList.java:67) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:96) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1037) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.forwardBatch(SparkReduceRecordHandler.java:542) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:584) > ... 11 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16787) Fix itests in branch-2.2
[ https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094815#comment-16094815 ] Alan Gates commented on HIVE-16787: --- +1 > Fix itests in branch-2.2 > > > Key: HIVE-16787 > URL: https://issues.apache.org/jira/browse/HIVE-16787 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > > The itests are broken in branch 2.2 and need to be fixed before release. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17034) The spark tar for itests is downloaded every time if md5sum is not installed
[ https://issues.apache.org/jira/browse/HIVE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094726#comment-16094726 ] Zoltan Haindrich commented on HIVE-17034: - [~lirui] I think that the new solution will never download anything in case md5sum is not installed...I think it would be better to fail the build in case md5sum is not installed - I guess currently if someone doesn't have md5sum accessible in the path; will end up with cryptic messages about spark being not in the vicinity... > The spark tar for itests is downloaded every time if md5sum is not installed > > > Key: HIVE-17034 > URL: https://issues.apache.org/jira/browse/HIVE-17034 > Project: Hive > Issue Type: Test > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17034.1.patch > > > I think we should either skip verifying md5, or fail the build to let > developer know md5sum is required. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others
[ https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094705#comment-16094705 ] Daniel Voros commented on HIVE-16222: - [~sershe], [~leftylev] is right, this hasn't been committed yet. > add a setting to disable row.serde for specific formats; enable for others > -- > > Key: HIVE-16222 > URL: https://issues.apache.org/jira/browse/HIVE-16222 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, > HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, > HIVE-16222.patch > > > Per [~gopalv] > {quote} > row.serde = true ... breaks Parquet (they expect to get the same object back, > which means you can't buffer 1024 rows). > {quote} > We want to enable this and vector.serde for text vectorization. Need to turn > it off for specific formats. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others
[ https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reopened HIVE-16222: - > add a setting to disable row.serde for specific formats; enable for others > -- > > Key: HIVE-16222 > URL: https://issues.apache.org/jira/browse/HIVE-16222 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, > HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, > HIVE-16222.patch > > > Per [~gopalv] > {quote} > row.serde = true ... breaks Parquet (they expect to get the same object back, > which means you can't buffer 1024 rows). > {quote} > We want to enable this and vector.serde for text vectorization. Need to turn > it off for specific formats. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed
[ https://issues.apache.org/jira/browse/HIVE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094640#comment-16094640 ] Hive QA commented on HIVE-17114: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878159/HIVE-17114.3.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11087 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6096/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6096/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6096/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12878159 - PreCommit-HIVE-Build > HoS: Possible skew in shuffling when data is not really skewed > -- > > Key: HIVE-17114 > URL: https://issues.apache.org/jira/browse/HIVE-17114 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17114.1.patch, HIVE-17114.2.patch, > HIVE-17114.3.patch > > > Observed in HoS and may apply to other engines as well. > When we join 2 tables on a single int key, we use the key itself as hash code > in {{ObjectInspectorUtils.hashCode}}: > {code} > case INT: > return ((IntObjectInspector) poi).get(o); > {code} > Suppose the keys are different but are all some multiples of 10. And if we > choose 10 as #reducers, the shuffle will be skewed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17037) Use 1-to-1 Tez edge to avoid unnecessary input data shuffle
[ https://issues.apache.org/jira/browse/HIVE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17037: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master, thanks [~ashutoshc]! > Use 1-to-1 Tez edge to avoid unnecessary input data shuffle > --- > > Key: HIVE-17037 > URL: https://issues.apache.org/jira/browse/HIVE-17037 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 3.0.0 > > Attachments: HIVE-17037.01.patch, HIVE-17037.02.patch, > HIVE-17037.03.patch, HIVE-17037.patch > > > As an example, consider the following query: > {code:sql} > SELECT * > FROM ( > SELECT a.value > FROM src1 a > JOIN src1 b > ON (a.value = b.value) > GROUP BY a.value > ) a > JOIN src > ON (a.value = src.value); > {code} > Currently, the plan generated for Tez will contain an unnecessary shuffle > operation between the subquery and the join, since the records produced by > the subquery are already sorted by the value. > This issue is to extend join algorithm selection to be able to shuffle only > some of the inputs for a given join and avoid unnecessary shuffle operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16945) Add method to compare Operators
[ https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094596#comment-16094596 ] Jesus Camacho Rodriguez commented on HIVE-16945: [~lirui], sure, that would be great! I can review it once it is ready. > Add method to compare Operators > > > Key: HIVE-16945 > URL: https://issues.apache.org/jira/browse/HIVE-16945 > Project: Hive > Issue Type: Improvement > Components: Operators >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez > > HIVE-10844 introduced a comparator factory class for operators that > encapsulates all the logic to assess whether two operators are equal: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java > The current design might create problems as any change in fields of operators > will break the comparators. It would be better to do this via inheritance > from Operator base class, by adding a {{logicalEquals(Operator other)}} > method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16945) Add method to compare Operators
[ https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094549#comment-16094549 ] Jesus Camacho Rodriguez edited comment on HIVE-16945 at 7/20/17 12:13 PM: -- [~lirui], thanks for the feedback. I had not started working on this issue yet, I created the placeholder to keep it in mind. I guess overriding equals/hashCode would create a bunch of issues since codebase has relied on identity comparison for Operator objects, thus I would create a new method indeed. I agree with you that {{compareTo}} is not a good name for it, {{logicalEquals}} would be better. I have changed it in the description. was (Author: jcamachorodriguez): [~lirui], thanks for the feedback. I had not started working on this issue yet, I created the placeholder to keep it in mind. I guess overriding equals/hashCode would create a bunch of issues since codebase has relied on identity comparison for this objects, thus I would create a new method indeed. I agree with you that {{compareTo}} is not a good name for it, {{logicalEquals}} would be better. I have changed it in the description. > Add method to compare Operators > > > Key: HIVE-16945 > URL: https://issues.apache.org/jira/browse/HIVE-16945 > Project: Hive > Issue Type: Improvement > Components: Operators >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez > > HIVE-10844 introduced a comparator factory class for operators that > encapsulates all the logic to assess whether two operators are equal: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java > The current design might create problems as any change in fields of operators > will break the comparators. It would be better to do this via inheritance > from Operator base class, by adding a {{logicalEquals(Operator other)}} > method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16945) Add method to compare Operators
[ https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li reassigned HIVE-16945: - Assignee: Rui Li > Add method to compare Operators > > > Key: HIVE-16945 > URL: https://issues.apache.org/jira/browse/HIVE-16945 > Project: Hive > Issue Type: Improvement > Components: Operators >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Rui Li > > HIVE-10844 introduced a comparator factory class for operators that > encapsulates all the logic to assess whether two operators are equal: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java > The current design might create problems as any change in fields of operators > will break the comparators. It would be better to do this via inheritance > from Operator base class, by adding a {{logicalEquals(Operator other)}} > method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094597#comment-16094597 ] Jesus Camacho Rodriguez commented on HIVE-16996: [~pxiong], I am seeing the following warning when I compile the project: {noformat} [WARNING] Some problems were encountered while building the effective model for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: javolution:javolution:jar -> duplicate declaration of version ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, column 17 {noformat} Might be related to this change? Thanks > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch, > HIVE-16966.06.patch, HIVE-16966.07.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16945) Add method to compare Operators
[ https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094594#comment-16094594 ] Rui Li commented on HIVE-16945: --- Thanks [~jcamachorodriguez] for the explanations. I think I can help. Would you mind if I work on it? > Add method to compare Operators > > > Key: HIVE-16945 > URL: https://issues.apache.org/jira/browse/HIVE-16945 > Project: Hive > Issue Type: Improvement > Components: Operators >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez > > HIVE-10844 introduced a comparator factory class for operators that > encapsulates all the logic to assess whether two operators are equal: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java > The current design might create problems as any change in fields of operators > will break the comparators. It would be better to do this via inheritance > from Operator base class, by adding a {{logicalEquals(Operator other)}} > method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed
[ https://issues.apache.org/jira/browse/HIVE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094588#comment-16094588 ] Rui Li commented on HIVE-17114: --- {{constprog_semijoin}} needs sorting query results. Other failures are not related. Update patch v3 to address it. [~xuefuz], [~csun] would you mind take a look? Thanks. The idea is to set UNIFORM trait to RS in {{SetSparkReducerParallelism}}, when num reducers is automatically decided. Most of the code change is refactoring in order to be more concise. > HoS: Possible skew in shuffling when data is not really skewed > -- > > Key: HIVE-17114 > URL: https://issues.apache.org/jira/browse/HIVE-17114 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17114.1.patch, HIVE-17114.2.patch, > HIVE-17114.3.patch > > > Observed in HoS and may apply to other engines as well. > When we join 2 tables on a single int key, we use the key itself as hash code > in {{ObjectInspectorUtils.hashCode}}: > {code} > case INT: > return ((IntObjectInspector) poi).get(o); > {code} > Suppose the keys are different but are all some multiples of 10. And if we > choose 10 as #reducers, the shuffle will be skewed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16924) Support distinct in presence Gby
[ https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094576#comment-16094576 ] Hive QA commented on HIVE-16924: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878116/HIVE-16924.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 45 failed/errored test(s), 11088 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_3] (batchId=25) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_5] (batchId=63) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[distinct_gby] (batchId=56) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_duplicate_key] (batchId=6) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[having2] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_distinct] (batchId=54) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_join] (batchId=19) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_3] (batchId=19) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_5] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_null_projection] (batchId=9) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk] (batchId=98) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[global_limit] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_unionDistinct_2] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[selectDistinctStar] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_cache] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[unionDistinct_2] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_3] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_5] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_null_projection] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_ptf] (batchId=157) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[unionDistinct_2] (batchId=99) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[selectDistinctStarNeg_2] (batchId=89) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udaf_invalid_place] (batchId=89) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[limit_pushdown] (batchId=130) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_distinct] (batchId=124) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_join] (batchId=109) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ptf] (batchId=108) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=128) org.apache.hadoop.hive.ql.parse.TestParseNegativeDriver.testCliDriver[wrong_distinct1] (batchId=239) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6095/testReport Console output:
[jira] [Updated] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed
[ https://issues.apache.org/jira/browse/HIVE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-17114: -- Attachment: HIVE-17114.3.patch > HoS: Possible skew in shuffling when data is not really skewed > -- > > Key: HIVE-17114 > URL: https://issues.apache.org/jira/browse/HIVE-17114 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17114.1.patch, HIVE-17114.2.patch, > HIVE-17114.3.patch > > > Observed in HoS and may apply to other engines as well. > When we join 2 tables on a single int key, we use the key itself as hash code > in {{ObjectInspectorUtils.hashCode}}: > {code} > case INT: > return ((IntObjectInspector) poi).get(o); > {code} > Suppose the keys are different but are all some multiples of 10. And if we > choose 10 as #reducers, the shuffle will be skewed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)