[jira] [Created] (HIVE-23973) Use SQL constraints to improve join reordering algorithm (III)
Jesus Camacho Rodriguez created HIVE-23973: -- Summary: Use SQL constraints to improve join reordering algorithm (III) Key: HIVE-23973 URL: https://issues.apache.org/jira/browse/HIVE-23973 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23972) Add external client ID to LLAP external client
Jason Dere created HIVE-23972: - Summary: Add external client ID to LLAP external client Key: HIVE-23972 URL: https://issues.apache.org/jira/browse/HIVE-23972 Project: Hive Issue Type: Bug Components: llap Reporter: Jason Dere Assignee: Jason Dere There currently is not a good way to tell which currently running LLAP tasks are from external LLAP clients, and also no good way to know which application is submitting these external LLAP requests. One possible solution for this is to add an option for the external LLAP client to pass in an external client ID, which can get logged by HiveServer2 during the getSplits request, as well as displayed from the LLAP executorsStatus. cc [~ShubhamChaurasia] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23971) Cleanup unreleased method signatures in IMetastoreClient
Vihang Karajgaonkar created HIVE-23971: -- Summary: Cleanup unreleased method signatures in IMetastoreClient Key: HIVE-23971 URL: https://issues.apache.org/jira/browse/HIVE-23971 Project: Hive Issue Type: Improvement Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar There are many methods in IMetastoreClient which are simply wrappers around another method. The code has become very intertwined and needs some cleanup. For instance, I see the following variations of {{getPartitionsByNames}} in {{IMetastoreClient}} {noformat} List getPartitionsByNames(String db_name, String tbl_name, List part_names, boolean getColStats, String engine) List getPartitionsByNames(String catName, String db_name, String tbl_name, List part_names) List getPartitionsByNames(String catName, String db_name, String tbl_name, List part_names, boolean getColStats, String engine) {noformat} The problem seems be that every time a new field is added to the request object {{GetPartitionsByNamesRequest}} and new variant is introduced in IMetastoreClient. Many of these methods are not released yet and it would be good to clean them up by using the request object as method argument instead of individual fields. Once we release we will not be able to change the method signatures since we annotate IMetastoreClient as public API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23970) Reject database creation if managedlocation is incorrect
Naveen Gangam created HIVE-23970: Summary: Reject database creation if managedlocation is incorrect Key: HIVE-23970 URL: https://issues.apache.org/jira/browse/HIVE-23970 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 4.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam With some changes in HIVE-23387, managed location check gets bypassed. Need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23969) Table owner info not being passed during show tables in database.
Sai Hemanth Gantasala created HIVE-23969: Summary: Table owner info not being passed during show tables in database. Key: HIVE-23969 URL: https://issues.apache.org/jira/browse/HIVE-23969 Project: Hive Issue Type: Bug Components: Hive Reporter: Sai Hemanth Gantasala Assignee: Sai Hemanth Gantasala Attachments: Screen Shot 2020-07-31 at 10.55.51 AM.png, Screen Shot 2020-07-31 at 10.56.25 AM.png, Screen Shot 2020-07-31 at 10.56.51 AM.png Table owner information is not being passed in HiveMetaStore. As a result, even though a user is the owner of tables, without a ranger policy, the user is unable to view the tables created by the self. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Hive TPC-DS metastore dumps in Postgres
There is now a PR [1] with various improvements over the last update. Feel free to check it out and let me know what you think. Best, Stamatis [1] https://github.com/apache/hive/pull/1347 On Mon, Jun 22, 2020 at 5:32 PM Stamatis Zampetakis wrote: > Hey guys, > > I put up a small project on GitHub [1] with Hive metastore dumps from > tpcds10tb/tpcds30tb (+partitioning) and some scripts to quickly spin up a > dockerized Postgres with those loaded. > > Personally, I find it useful to check the plans of TPC-DS queries using > the usual qtest mechanism (without external tools and tapping into a real > cluster) having at hand beefy stats + partitioning info. The driver and > other changes needed to run these tests are located in [2]. > > I am sharing it here in case it might be of use to somebody else. > > The two main commands that you will need if you wanna try this out: > docker build --tag postgres-tpcds-metastore:1.0 . > mvn test -Dtest=TestTezPerfDBCliDriver -Dtest.output.overwrite=true > -Dtest.metastore.db=postgres.tpcds > > Small caveat: Currently in [2] the dockerized postgres is restarted for > every query which makes things slow. This will be fixed later on. > > Best, > Stamatis > > [1] https://github.com/zabetak/hive-postgres-metastore > [2] https://github.com/zabetak/hive/tree/qtest_postgres_driver >
[jira] [Created] (HIVE-23968) CTAS with TBLPROPERTIES ('transactional'='false') does not entertain translated table location
Rajkumar Singh created HIVE-23968: - Summary: CTAS with TBLPROPERTIES ('transactional'='false') does not entertain translated table location Key: HIVE-23968 URL: https://issues.apache.org/jira/browse/HIVE-23968 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0 Reporter: Rajkumar Singh HMS translation layer convert the table to external based on the transactional property set to false but MoveTask does not entertain the translated table location and move the data to the managed table location; steps to repro: {code:java} create table nontxnal TBLPROPERTIES ('transactional'='false') as select * from abc; {code} select query on table return nothing t but the source table has data in it. {code:java} select * from nontxnal; +--+ | nontxnal.id | +--+ +--+ {code} --show create table {code:java} CREATE EXTERNAL TABLE `nontxnal`( | | `id` int)| | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | | LOCATION | | 'hdfs://hostname:8020/warehouse/tablespace/external/hive/nontxnal' | | TBLPROPERTIES (| | 'TRANSLATED_TO_EXTERNAL'='TRUE', | | 'bucketing_version'='2', | | 'external.table.purge'='TRUE', | | 'transient_lastDdlTime'='1596215634')| {code} table data is moved to the managed location: ``` dfs -ls -R hdfs://hostname:8020/warehouse/tablespace/managed/hive/nontxnal . . . . . . . . . . . . . . . . . . . . . . .> ; ++ | DFS Output | ++ | -rw-rw+ 3 hive hadoop201 2020-07-31 17:05 hdfs://hostname:8020/warehouse/tablespace/managed/hive/nontxnal/00_0 | ++ ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23967) Give HiveMetastore thread pool a more descriptive name
Sam An created HIVE-23967: - Summary: Give HiveMetastore thread pool a more descriptive name Key: HIVE-23967 URL: https://issues.apache.org/jira/browse/HIVE-23967 Project: Hive Issue Type: Improvement Components: Hive Affects Versions: 4.0.0 Reporter: Sam An Assignee: Sam An Currently HiveMetastore threadpool name is the default generic format pool-id-thread-id. We should use a more descriptive one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23966) Minor query-based compaction always results in delta dirs with minWriteId=1
Karen Coppage created HIVE-23966: Summary: Minor query-based compaction always results in delta dirs with minWriteId=1 Key: HIVE-23966 URL: https://issues.apache.org/jira/browse/HIVE-23966 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage Minor compaction after major/IOW will result in directories that look like: * base_z_v * delta_1_y_v * delete_delta_1_y_v Should be: * base_z_v * delta_(z+1)_y_v * delete_delta_(z+1)_y_v -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
Stamatis Zampetakis created HIVE-23965: -- Summary: Improve plan regression tests using TPCDS30TB metastore dump and custom configs Key: HIVE-23965 URL: https://issues.apache.org/jira/browse/HIVE-23965 Project: Hive Issue Type: Improvement Reporter: Stamatis Zampetakis The existing regression tests (HIVE-12586) based on TPC-DS have certain shortcomings: The table statistics do not reflect cardinalities from a specific TPC-DS scale factor (SF). Some tables are from a 30TB dataset, others from 200GB dataset, and others from a 3GB dataset. This mix leads to plans that may never appear when using an actual TPC-DS dataset. The existing statistics do not contain information about partitions something that can have a big impact on the resulting plans. The existing regression tests rely on more or less on the default configuration (hive-site.xml). In real-life scenarios though some of the configurations differ and may impact the choices of the optimizer. This issue aims to address the above shortcomings by using a curated TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23964) SemanticException in query 30 while generating logical plan
Stamatis Zampetakis created HIVE-23964: -- Summary: SemanticException in query 30 while generating logical plan Key: HIVE-23964 URL: https://issues.apache.org/jira/browse/HIVE-23964 Project: Hive Issue Type: Bug Reporter: Stamatis Zampetakis Attachments: cbo_query30_stacktrace.txt Invalid table alias or column reference 'c_last_review_date' is thrown when running TPC-DS query 30 (cbo_query30.q, query30.q) on the metastore with the partitoned TPC-DS 30TB dataset. The respective stacktrace is attached to this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23963) UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule
Stamatis Zampetakis created HIVE-23963: -- Summary: UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule Key: HIVE-23963 URL: https://issues.apache.org/jira/browse/HIVE-23963 Project: Hive Issue Type: Bug Components: CBO Reporter: Stamatis Zampetakis Attachments: cbo_query74_stacktrace.txt, cbo_query84_stacktrace.txt The following TPC-DS queries: * cbo_query74.q * cbo_query84.q * query74.q * query84.q fail on the metastore with the partitioned TPC-DS 30TB dataset. The stacktraces for cbo_query74 and cbo_query84 show that the problem originates while applying HiveCardinalityPreservingJoinRule. -- This message was sent by Atlassian Jira (v8.3.4#803005)
interesting feature of replay button
Hey All! As you might know the "replay" button can be used to re-launch a failed build on the jenkins UI. Today I just learned that it has a "hidden" feature: in the [1] run it retained the merge point [2] against the master branch which was a few days old - this means that new commits landed on master as the first execution will not be taken account ; I believe this is caused by the fact that Github is "creating" the merge commit... This is the second coming of that feature going sideways(branch indexing was the earlier issue) - so as I'll have time for it I'll make some modifications to remove the need for that merge commit. Until then I'll recommend to either push an empty commit to trigger a new build or close/open the PR to generate the appropriate events. [1] http://ci.hive.apache.org/job/hive-precommit/job/PR-1252/11/ [2] e34acf5c677a23af0053ac98532a9caa9e190b6c cheers, Zoltan