[jira] [Created] (HIVE-23973) Use SQL constraints to improve join reordering algorithm (III)

2020-07-31 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-23973:
--

 Summary: Use SQL constraints to improve join reordering algorithm 
(III)
 Key: HIVE-23973
 URL: https://issues.apache.org/jira/browse/HIVE-23973
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23972) Add external client ID to LLAP external client

2020-07-31 Thread Jason Dere (Jira)
Jason Dere created HIVE-23972:
-

 Summary: Add external client ID to LLAP external client
 Key: HIVE-23972
 URL: https://issues.apache.org/jira/browse/HIVE-23972
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Jason Dere
Assignee: Jason Dere


There currently is not a good way to tell which currently running LLAP tasks 
are from external LLAP clients, and also no good way to know which application 
is submitting these external LLAP requests.
One possible solution for this is to add an option for the external LLAP client 
to pass in an external client ID, which can get logged by HiveServer2 during 
the getSplits request, as well as displayed from the LLAP executorsStatus.

cc [~ShubhamChaurasia]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23971) Cleanup unreleased method signatures in IMetastoreClient

2020-07-31 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-23971:
--

 Summary: Cleanup unreleased method signatures in IMetastoreClient
 Key: HIVE-23971
 URL: https://issues.apache.org/jira/browse/HIVE-23971
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


There are many methods in IMetastoreClient which are simply wrappers around 
another method. The code has become very intertwined and needs some cleanup. 
For instance, I see the following variations of {{getPartitionsByNames}} in 
{{IMetastoreClient}} 

{noformat}

List getPartitionsByNames(String db_name, String tbl_name, 
List part_names, boolean getColStats, String engine)

List getPartitionsByNames(String catName, String db_name, String 
tbl_name, List part_names)

List getPartitionsByNames(String catName, String db_name, String 
tbl_name, List part_names, boolean getColStats, String engine)
{noformat}

The problem seems be that every time a new field is added to the request object 
{{GetPartitionsByNamesRequest}} and new variant is introduced in 
IMetastoreClient. Many of these methods are not released yet and it would be 
good to clean them up by using the request object as method argument instead of 
individual fields. Once we release we will not be able to change the method 
signatures since we annotate IMetastoreClient as public API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23970) Reject database creation if managedlocation is incorrect

2020-07-31 Thread Naveen Gangam (Jira)
Naveen Gangam created HIVE-23970:


 Summary: Reject database creation if managedlocation is incorrect
 Key: HIVE-23970
 URL: https://issues.apache.org/jira/browse/HIVE-23970
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 4.0.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam


With some changes in HIVE-23387, managed location check gets bypassed. Need to 
be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23969) Table owner info not being passed during show tables in database.

2020-07-31 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-23969:


 Summary: Table owner info not being passed during show tables in 
database.
 Key: HIVE-23969
 URL: https://issues.apache.org/jira/browse/HIVE-23969
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala
 Attachments: Screen Shot 2020-07-31 at 10.55.51 AM.png, Screen Shot 
2020-07-31 at 10.56.25 AM.png, Screen Shot 2020-07-31 at 10.56.51 AM.png

Table owner information is not being passed in HiveMetaStore. As a result, even 
though a user is the owner of tables, without a ranger policy, the user is 
unable to view the tables created by the self.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Hive TPC-DS metastore dumps in Postgres

2020-07-31 Thread Stamatis Zampetakis
There is now a PR [1] with various improvements over the last update. Feel
free to check it out and let me know what you think.

Best,
Stamatis

[1] https://github.com/apache/hive/pull/1347

On Mon, Jun 22, 2020 at 5:32 PM Stamatis Zampetakis 
wrote:

> Hey guys,
>
> I put up a small project on GitHub [1] with Hive metastore dumps from
> tpcds10tb/tpcds30tb (+partitioning) and some scripts to quickly spin up a
> dockerized Postgres with those loaded.
>
> Personally, I find it useful to check the plans of TPC-DS queries using
> the usual qtest mechanism (without external tools and tapping into a real
> cluster) having at hand beefy stats + partitioning info. The driver and
> other changes needed to run these tests are located in [2].
>
> I am sharing it here in case it might be of use to somebody else.
>
> The two main commands that you will need if you wanna try this out:
> docker build --tag postgres-tpcds-metastore:1.0 .
> mvn test -Dtest=TestTezPerfDBCliDriver -Dtest.output.overwrite=true
> -Dtest.metastore.db=postgres.tpcds
>
> Small caveat: Currently in [2] the dockerized postgres is restarted for
> every query which makes things slow. This will be fixed later on.
>
> Best,
> Stamatis
>
> [1] https://github.com/zabetak/hive-postgres-metastore
> [2] https://github.com/zabetak/hive/tree/qtest_postgres_driver
>


[jira] [Created] (HIVE-23968) CTAS with TBLPROPERTIES ('transactional'='false') does not entertain translated table location

2020-07-31 Thread Rajkumar Singh (Jira)
Rajkumar Singh created HIVE-23968:
-

 Summary: CTAS with TBLPROPERTIES ('transactional'='false') does 
not entertain translated table location
 Key: HIVE-23968
 URL: https://issues.apache.org/jira/browse/HIVE-23968
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Rajkumar Singh


HMS translation layer convert the table to external based on the transactional 
property set to false but MoveTask does not entertain the translated table 
location and move the data to the managed table location;

steps to repro:

{code:java}
create table nontxnal TBLPROPERTIES ('transactional'='false') as select * from 
abc;
{code}

select query on table return nothing t but the source table has data in it.
{code:java}
select * from nontxnal;
+--+
| nontxnal.id  |
+--+
+--+
{code}

--show create table

{code:java}
CREATE EXTERNAL TABLE `nontxnal`(  |
|   `id` int)|
| ROW FORMAT SERDE   |
|   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
| STORED AS INPUTFORMAT  |
|   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
| OUTPUTFORMAT   |
|   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
| LOCATION   |
|   'hdfs://hostname:8020/warehouse/tablespace/external/hive/nontxnal' |
| TBLPROPERTIES (|
|   'TRANSLATED_TO_EXTERNAL'='TRUE', |
|   'bucketing_version'='2', |
|   'external.table.purge'='TRUE',   |
|   'transient_lastDdlTime'='1596215634')|

{code}

table data is moved to the managed location:
```
dfs -ls -R  hdfs://hostname:8020/warehouse/tablespace/managed/hive/nontxnal
. . . . . . . . . . . . . . . . . . . . . . .> ;
++
| DFS Output |
++
| -rw-rw+  3 hive hadoop201 2020-07-31 17:05 
hdfs://hostname:8020/warehouse/tablespace/managed/hive/nontxnal/00_0 |
++

```





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23967) Give HiveMetastore thread pool a more descriptive name

2020-07-31 Thread Sam An (Jira)
Sam An created HIVE-23967:
-

 Summary: Give HiveMetastore thread pool a more descriptive name
 Key: HIVE-23967
 URL: https://issues.apache.org/jira/browse/HIVE-23967
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 4.0.0
Reporter: Sam An
Assignee: Sam An


Currently HiveMetastore threadpool name is the default generic format 
pool-id-thread-id. We should use a more descriptive one. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23966) Minor query-based compaction always results in delta dirs with minWriteId=1

2020-07-31 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-23966:


 Summary: Minor query-based compaction always results in delta dirs 
with minWriteId=1
 Key: HIVE-23966
 URL: https://issues.apache.org/jira/browse/HIVE-23966
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage


Minor compaction after major/IOW will result in directories that look like:
 * base_z_v
 * delta_1_y_v
 * delete_delta_1_y_v

Should be:
 * base_z_v
 * delta_(z+1)_y_v
 * delete_delta_(z+1)_y_v



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-07-31 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-23965:
--

 Summary: Improve plan regression tests using TPCDS30TB metastore 
dump and custom configs
 Key: HIVE-23965
 URL: https://issues.apache.org/jira/browse/HIVE-23965
 Project: Hive
  Issue Type: Improvement
Reporter: Stamatis Zampetakis


The existing regression tests (HIVE-12586) based on TPC-DS have certain 
shortcomings:

The table statistics do not reflect cardinalities from a specific TPC-DS scale 
factor (SF). Some tables are from a 30TB dataset, others from 200GB dataset, 
and others from a 3GB dataset. This mix leads to plans that may never appear 
when using an actual TPC-DS dataset. 

The existing statistics do not contain information about partitions something 
that can have a big impact on the resulting plans.

The existing regression tests rely on more or less on the default configuration 
(hive-site.xml). In real-life scenarios though some of the configurations 
differ and may impact the choices of the optimizer.

This issue aims to address the above shortcomings by using a curated TPCDS30TB 
metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23964) SemanticException in query 30 while generating logical plan

2020-07-31 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-23964:
--

 Summary: SemanticException in query 30 while generating logical 
plan
 Key: HIVE-23964
 URL: https://issues.apache.org/jira/browse/HIVE-23964
 Project: Hive
  Issue Type: Bug
Reporter: Stamatis Zampetakis
 Attachments: cbo_query30_stacktrace.txt

Invalid table alias or column reference 'c_last_review_date' is thrown when  
running TPC-DS query 30 (cbo_query30.q, query30.q) on the metastore with the 
partitoned TPC-DS 30TB dataset. 

The respective stacktrace is attached to this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23963) UnsupportedOperationException in queries 74 and 84 while applying HiveCardinalityPreservingJoinRule

2020-07-31 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-23963:
--

 Summary: UnsupportedOperationException in queries 74 and 84 while 
applying HiveCardinalityPreservingJoinRule
 Key: HIVE-23963
 URL: https://issues.apache.org/jira/browse/HIVE-23963
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Stamatis Zampetakis
 Attachments: cbo_query74_stacktrace.txt, cbo_query84_stacktrace.txt

The following TPC-DS queries: 
* cbo_query74.q
* cbo_query84.q 
* query74.q 
* query84.q 

fail on the metastore with the partitioned TPC-DS 30TB dataset.

The stacktraces for cbo_query74 and cbo_query84 show that the problem 
originates while applying HiveCardinalityPreservingJoinRule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


interesting feature of replay button

2020-07-31 Thread Zoltan Haindrich

Hey All!

As you might know the "replay" button can be used to re-launch a failed build 
on the jenkins UI.
Today I just learned that it has a "hidden" feature: in the [1] run it retained the merge point [2] against the master branch which was a few days old - this means that new 
commits landed on master as the first execution will not be taken account ; I believe this is caused by the fact that Github is "creating" the merge commit...



This is the second coming of that feature going sideways(branch indexing was the earlier issue) - so as I'll have time for it I'll make some modifications to remove the 
need for that merge commit.


Until then I'll recommend to either push an empty commit to trigger a new build 
or close/open the PR to generate the appropriate events.

[1] http://ci.hive.apache.org/job/hive-precommit/job/PR-1252/11/
[2] e34acf5c677a23af0053ac98532a9caa9e190b6c

cheers,
Zoltan