Re: Some tests started hanging recently

2020-06-19 Thread Jagat Singh
Hello Zoltan,

Thank you for this.

So, I ran few tests under itests to replicate the issue.

There are 10 files inside itests which use Tez in one form or the other. I
ran tests for all.

All of them finished with maximum one running duration for 8 mins.

Below are the timings for all

|Testname|Duration|
|TestMmCompactorOnTez| 3.41 min|
|TestAcidOnTez||
|TestCrudCompactorOnTez| 4.00 min|
|TestBeeLineWithArgs| 2.59 min|
|TestCopyUtils| 49 s|
|TestTriggersTezSessionPoolManager| 22 s|
|TestTriggersWorkloadManager| 21 s|
|TestTriggersNoTezSessionPool| 1.50 min|
|TestTezPerfConstraintsCliDriver| 8 min |

I am not sure what is causing the issue seen on the build server.

Regards,

Jagat Singh

On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich  wrote:

> Hey all,
>
> Since yesterday some tests started to hang - most frequently
> TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication
> test as well - so I don't think its
> limited to those 2 tests.
>
> I was not able to figure out what have caused this - my current guess is
> that somehow the tez 0.9.2 upgrade have caused it.
> To validate this guess I've started the flaky checker with and without
> that patch from the current state...
>
> I've collected some jstacks from the containers running for more than 20
> hours
>
> https://termbin.com/z1eoc
> https://termbin.com/2m0j
> https://termbin.com/027t
> https://termbin.com/1dbe
>
> cheers,
> Zoltan
>


Re: Some tests started hanging recently

2020-06-18 Thread Jagat Singh
Hello Zoltan,

I was not expecting to hear this for my first PR :(

I will also try to re-run the tests locally on my system and report back to
you.

Thanks,

Jagat Singh

On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich  wrote:

> Hey all,
>
> Since yesterday some tests started to hang - most frequently
> TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication
> test as well - so I don't think its
> limited to those 2 tests.
>
> I was not able to figure out what have caused this - my current guess is
> that somehow the tez 0.9.2 upgrade have caused it.
> To validate this guess I've started the flaky checker with and without
> that patch from the current state...
>
> I've collected some jstacks from the containers running for more than 20
> hours
>
> https://termbin.com/z1eoc
> https://termbin.com/2m0j
> https://termbin.com/027t
> https://termbin.com/1dbe
>
> cheers,
> Zoltan
>


Re: Reviewers and assignees of PRs

2020-06-18 Thread Jagat Singh
Hello Zoltan,

One thing which needs improvement is updating of Hive Contributors wiki
with whatever process happens on Github and Build server-side.

The current confluence is silent on what to expect when we create a PR as a
contributor, who will review, what will build system do? Where to look for
errors?

Based on my first PR experience, do you manually label PRs as test stable,
unstable etc? I am not sure if that can be automated if not done already
along with auto assigning of reviews as you intend to do with this current
proposal.

I can update a few things based on what I learnt as I recently started
contributing and I feel these all things are missing. But there are many
things for which I don't know the answer yet and will appreciate if someone
experienced update the wiki to add details like above questions.

Thanks,

Jagat Singh

On Thu, 18 Jun 2020 at 20:43, Zoltan Haindrich  wrote:

> Hey Panos!
>
> On 6/18/20 11:54 AM, Panos Garefalakis wrote:
> > My only suggestion would be to make reviewing per package/label instead
> of
> > files. This will make the process a bit more clear.
>
> we could use path globs to select the files - so it could match on
> packages as well
> I've not really used it
> '**/schq/**'
>
> > I recently bumped into this GitHub action that lets you automatically
> label
> > PRs based on what paths they modify and could help us towards that goal.
> >
> > https://github.com/actions/labeler
>
> Sure; we can also have that as well! they may fit for different purposes.
> Aactually - based on the "absence" of some labels (eg: metastore) we may
> "skip" some tests.
>
> cheers,
> Zoltan
>
> >
> > Thoughts?
> >
> > Cheers,
> > Panagiotis
> >
> > On Thu, Jun 18, 2020 at 10:42 AM Zoltan Haindrich  wrote:
> >
> >> Hey all!
> >>
> >> I'm happy to see that (I guess) everyone is using the PR based stuff
> >> without issues - there are still some flaky stuff from time-to-time;
> but I
> >> feel that patches go in
> >> faster - and I have a feeling we have more reviewes going on as well -
> >> which is awesome!
> >>
> >> I've read a bit about github "reviewers" / "assignee" stuff - because it
> >> seemed somewhat confusing...
> >> Basically both of them could be a group of users - the meaning of these
> >> fields should be filled by the community.
> >> I would like to propose to use the "reviewers" to use it as people from
> >> whom reviews might be expected.
> >> And use the assignee field to list those who should approve the change
> to
> >> go in (anyone may add asignees/reviewers)
> >>
> >> We sometimes forget PRs and they may become "stale" most of them is just
> >> falling thru the cracks...to prevent this the best would be if everyone
> >> would self-assign PRs which
> >> are in his/her area of interest.
> >>
> >> There are some times when a give feature needs to change not closely
> >> related parts of the codebase - this is usually fine; but there are
> places
> >> which might need "more eyes"
> >> on reviews.
> >> In the past I was sometimes surprised by some interesting changes in say
> >> the thrift api / package.jdo / antlr stuff.
> >>
> >> Because the jira title may not suggest what files will be changed - I
> >> wanted to find a way to auto add some kind of notifications to PRs.
> >>
> >> Today I've found a neat solution to this [1] - which goes a little bit
> >> beyond what I anticipated - there is a small plugin which could enable
> to
> >> auto-add reviewers based on
> >> the changed files (adding a reviewer will also emit an email) - I had to
> >> fix a few small issues with it to ensure that it works/etc [2].
> >>
> >> I really like this approach beacuase it could enable to change the
> >> direction of things - and could enable that contributors doesn't
> >> neccessarily need to look for reviewers.
> >> (but this seems more like just sci-fi right now - lets start small and
> go
> >> from there...)
> >>
> >> I propose to collect some globs and reviewers in a google doc before we
> >> first commit this file into the repo - so that everyone could add things
> >> he/she is interested in.
> >>
> >> cheers,
> >> Zoltan
> >>
> >> [1]
> https://github.com/marketplace/actions/auto-assign-reviewer-by-files
> >> [2] https://github.com/kgyrtkirk/auto-assign-reviewer-by-files
> >> [3]
> >>
> https://docs.google.com/document/d/11n9acHby31rwVHfRW4zxxYukymHS-tTSYlJEghZwJaY/edit?usp=sharing
> >>
> >
>


Re: HCatalog tests create test output inside source code folders fails rat

2020-06-17 Thread Jagat Singh
Thanks, Zoltan,

I can raise PR for the annoyance I faced.

I am not sure what is the best end action we want to after checking that
the working tree is clean or not, do you see that we just display the
message in Gradle or actually doing something with those files?

Thanks in advance,

Jagat Singh

On Wed, 17 Jun 2020 at 18:52, Zoltan Haindrich  wrote:

> Hey Jagat!
>
> Yeah; this looks pretty annoying...I think these are some ancient tests; I
> don't think those files should be there; this should be fixed.
> Could you file a jira to fix it?
> I think after running the tests we might want to also add a check that the
> worktree is clean.
>
> cheers,
> Zoltan
>
>
> On 6/15/20 6:55 AM, Jagat Singh wrote:
> > Hello all,
> >
> > Currently, this line makes test output data to be produced inside the
> > folders which are not under rat exclude condition, This makes the rat
> > checks due to the absence of License files. Is this intentional or should
> > it be fixed? Ideally, the test output data should not stay inside the
> > current folder structure and should reside in some standard temporary
> > folder. The folder mapred/testHcatMapReduceOutput gets created under
> > hcatalog/core at this moment.
> >
> >
> https://github.com/apache/hive/blob/3ab174d82ffc2bd27432c0b04433be3bd7db5c6a/hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/HCatMapReduceTest.java#L403
> >
> >
> > Path path = new Path(fs.getWorkingDirectory(),
> > "mapred/testHCatMapReduceOutput");
> >
> > /home/jj/dev/code/open/hive/hcatalog/core/mapred
> > ├── testHCatMapReduceInput
> > └── testHCatMapReduceOutput
> >  ├── part-m-0
> >  ├── part-m-1
> >  ├── part-m-2
> >  ├── part-m-3
> >  ├── part-m-4
> >  └── _SUCCESS
> >
> > 1 directory, 7 files
> >
> > Thanks for reading and in advance thanks for your reply.
> >
> > Regards,
> >
> > Jagat Singh
> >
>


Hive Dev Unit tests parallel execution

2020-06-17 Thread Jagat Singh
Hello everyone,

Is it possible to run Hive unit tests parallelly?

Is this document an updated one?

https://cwiki.apache.org/confluence/display/Hive/Unit+Test+Parallel+Execution

Thanks in advance for your help.

Regards,

Jagat Singh


[jira] [Created] (HIVE-23689) Bump Tez version to 0.9.2

2020-06-15 Thread Jagat Singh (Jira)
Jagat Singh created HIVE-23689:
--

 Summary: Bump Tez version to 0.9.2
 Key: HIVE-23689
 URL: https://issues.apache.org/jira/browse/HIVE-23689
 Project: Hive
  Issue Type: Improvement
Reporter: Jagat Singh


Bump Tez version to 0.9.2 from 0.9.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


HCatalog tests create test output inside source code folders fails rat

2020-06-14 Thread Jagat Singh
Hello all,

Currently, this line makes test output data to be produced inside the
folders which are not under rat exclude condition, This makes the rat
checks due to the absence of License files. Is this intentional or should
it be fixed? Ideally, the test output data should not stay inside the
current folder structure and should reside in some standard temporary
folder. The folder mapred/testHcatMapReduceOutput gets created under
hcatalog/core at this moment.

https://github.com/apache/hive/blob/3ab174d82ffc2bd27432c0b04433be3bd7db5c6a/hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/HCatMapReduceTest.java#L403


Path path = new Path(fs.getWorkingDirectory(),
"mapred/testHCatMapReduceOutput");

/home/jj/dev/code/open/hive/hcatalog/core/mapred
├── testHCatMapReduceInput
└── testHCatMapReduceOutput
├── part-m-0
├── part-m-1
├── part-m-2
├── part-m-3
├── part-m-4
└── _SUCCESS

1 directory, 7 files

Thanks for reading and in advance thanks for your reply.

Regards,

Jagat Singh


Testing Hive 4.0.0-SNAPSHOT with Hadoop 3.2.1 and Spark 3.0.0-preview2

2020-06-14 Thread Jagat Singh
Hello everyone,

I was playing with the latest source code of Hive, my goal is to make
Hadoop, Hive, Spark to work with the latest version of each other.

Locally, I made and ran full tests to make it run with

Hadoop 3.2.1
Spark 3.0.0-preview2
Tez 0.9.2

I ran the following maven command to ensure tests run successfully and did
any changes required.

mvn clean package -Pdist

1)
I was just wondering I can create a Jira to share these changes? I read the
contributors guide here
https://cwiki.apache.org/confluence/display/Hive/HowToContribute and it
says to ask on this mailing list before creating any Jira. If yes, should I
create separate Jira for Hadoop 3.2.1, Spark 3.0.0-preview2, Tez 0.9.2
changes?

2)
How should I do further testing to ensure things are working correctly? I
am trying to look at itests as well, for example below, but it does not
give any meaningful results. Maybe I am doing something wrong?

mvn test -q -Pitests -Dtest=TestSparkCliDriver

Thanks for reading.

Regards,

Jagat Singh