Re: [DISCUSS]: Commit guidelines for PRs

2020-06-30 Thread Peter Vary
Hi Vihang,

As you, I like the new infra very much!

I have opinion /answer for 2 of your questions below:

VK > 1. Whether to standardize on Squash into one commit

I think we should squash. The final code should not be polluted with the
meandering way we sometimes arrive to the final solution in some patch
processes with multiple reviewers.

VK > 3. Do committers merge the PR directly from the github?

Yes, I do. There is a possibility to connect your apache account to the
github account. You have to set-up 2 factor authentication for that. I
believe one of David Mollitor's previous letter contains more info on that.

I do not have that stong opinion on question 2. I believe that if the title
of the jira is good enough then it should be enough as a commit message
too. OTOH the PR/Jira discussion should contain the reasoning/debate behind
the scenes. But probably that's just me already adjusted to the status quo
:)

Thanks, Peter

Vihang Karajgaonkar  ezt írta (időpont: 2020. jún.
30., Ke 21:31):

> Thanks to all who worked on the new testing infrastructure. It definitely
> looks like a step up from the older test infrastructure.
>
> I wanted to know if there are any new guidelines for a committer for
> merging the PRs. Earlier we used to create one patch file for each JIRA and
> push it to the master branch. With PRs it is possible that a
> contributor publishes multiple commits (eg. to address review comments). I
> would like to start a discussion on what should be the guidelines on
> merging the PR requests?
>
> Most of you are probably already following it but it would be good to
> formalize the following:
>
> 1. Whether to standardize on Squash into one commit
> <
> https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-request-merges#squash-and-merge-your-pull-request-commits
> >
> for the PR?
> 2. What are the commit message guidelines? Our project has unfortunately
> not been great in documenting the commit message appropriately. Current
> guidelines are to have one line commit message and the JIRA is expected to
> have more detailed information. However, most of the time the JIRAs  don't
> have enough information. I think it would be good to add a few lines of
> description as part of the git commit message. Some projects recommend
> 50/72
> formatting
> <
> https://stackoverflow.com/questions/2290016/git-commit-messages-50-72-formatting
> >
> for
> the git commit message which I feel is nice.
> 3. Do committers merge the PR directly from the github? I am not sure if
> there is a way for our committer credentials to be integrated in github.
> Otherwise, the other option could be that the committer checks out the PR
> and merges it manually into the master branch.
>
> Thanks,
> Vihang
>


[jira] [Created] (HIVE-23787) Write all the events present in a task_queue in a single file.

2020-06-30 Thread Amlesh Kumar (Jira)
Amlesh Kumar created HIVE-23787:
---

 Summary: Write all the events present in a task_queue in a single 
file.
 Key: HIVE-23787
 URL: https://issues.apache.org/jira/browse/HIVE-23787
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Amlesh Kumar


DAS does not get the event when the queue becomes full, and it ignores the 
post_exec_hook / pre_exec_hook event. The default capacity is 64 in 
hive.hook.proto.queue.capacity config for hs2.

Now, we will increase the queue-capacity (let's say upto 256).
Also for the optimisation, need to run all the events present in a task_queue, 
and write in a single file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23786) HMS Server side filter

2020-06-30 Thread Sam An (Jira)
Sam An created HIVE-23786:
-

 Summary: HMS Server side filter
 Key: HIVE-23786
 URL: https://issues.apache.org/jira/browse/HIVE-23786
 Project: Hive
  Issue Type: Improvement
Reporter: Sam An
Assignee: Sam An


HMS server side filter of results based on authorization. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS]: Commit guidelines for PRs

2020-06-30 Thread Vihang Karajgaonkar
Thanks to all who worked on the new testing infrastructure. It definitely
looks like a step up from the older test infrastructure.

I wanted to know if there are any new guidelines for a committer for
merging the PRs. Earlier we used to create one patch file for each JIRA and
push it to the master branch. With PRs it is possible that a
contributor publishes multiple commits (eg. to address review comments). I
would like to start a discussion on what should be the guidelines on
merging the PR requests?

Most of you are probably already following it but it would be good to
formalize the following:

1. Whether to standardize on Squash into one commit

for the PR?
2. What are the commit message guidelines? Our project has unfortunately
not been great in documenting the commit message appropriately. Current
guidelines are to have one line commit message and the JIRA is expected to
have more detailed information. However, most of the time the JIRAs  don't
have enough information. I think it would be good to add a few lines of
description as part of the git commit message. Some projects recommend 50/72
formatting

for
the git commit message which I feel is nice.
3. Do committers merge the PR directly from the github? I am not sure if
there is a way for our committer credentials to be integrated in github.
Otherwise, the other option could be that the committer checks out the PR
and merges it manually into the master branch.

Thanks,
Vihang


Hive 4.0

2020-06-30 Thread Theyaa Matti
There are many Jiras that are tagged for Hive 4.0 release. Are there any
timelines when Hive 4 will be released please?


[jira] [Created] (HIVE-23785) Database should have a unique id

2020-06-30 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-23785:
--

 Summary: Database should have a unique id
 Key: HIVE-23785
 URL: https://issues.apache.org/jira/browse/HIVE-23785
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


HIVE-20556 introduced a id field to the Table object. This is a useful 
information since a table which is dropped and recreated with the same name 
will have a different Id. If a HMS client is caching such table object, it can 
be used to determine if the table which is present on the client-side matches 
with the one in the HMS.

We can expand this idea to other HMS objects like Database, Catalogs and 
Partitions and add a new id field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23784) Fix Replication Metrics Sink to DB

2020-06-30 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-23784:
--

 Summary: Fix Replication Metrics Sink to DB
 Key: HIVE-23784
 URL: https://issues.apache.org/jira/browse/HIVE-23784
 Project: Hive
  Issue Type: Task
Reporter: Aasha Medhi
Assignee: Aasha Medhi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23783) Support compaction in qtests

2020-06-30 Thread Jira
László Bodor created HIVE-23783:
---

 Summary: Support compaction in qtests
 Key: HIVE-23783
 URL: https://issues.apache.org/jira/browse/HIVE-23783
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23782) Beeline does not update application id on console if query was killed and started on new application

2020-06-30 Thread Adesh Kumar Rao (Jira)
Adesh Kumar Rao created HIVE-23782:
--

 Summary: Beeline does not update application id on console if 
query was killed and started on new application
 Key: HIVE-23782
 URL: https://issues.apache.org/jira/browse/HIVE-23782
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 4.0.0
Reporter: Adesh Kumar Rao
Assignee: Adesh Kumar Rao
 Fix For: 4.0.0


After HIVE-23619, beeline just prints the application ID once on console. If 
the query gets killed and is executed with another application, beeline will 
not update the new application id.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23781) Incomplete partition column stats in CachedStore may lead to wrong aggregate stats

2020-06-30 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-23781:
--

 Summary: Incomplete partition column stats in CachedStore may lead 
to wrong aggregate stats
 Key: HIVE-23781
 URL: https://issues.apache.org/jira/browse/HIVE-23781
 Project: Hive
  Issue Type: Bug
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Requesting aggregate stats from the Metastore ({{RawStore#get_aggr_stats_for}}) 
may return wrong results when the backing implementation is CachedStore and 
column statistics are missing from the cache.
 
The suspicious code lies inside {{CachedStore#mergeColStatsForPartitions}} that 
returns an [empty 
object|https://github.com/apache/hive/blob/31ee14644bf6105360d6266baa8c6c8060d38ea3/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java#L2267]
 when no stats are found in the cache. This is considered a valid value by the 
consumer so no additional lookup is performed in the rawstore to fetch the 
actual values.

Moreover, in the case where the cache holds values for some partitions but not 
for all those requested the result will be wrong assuming that the underlying 
rawstore has information about the requested partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)