Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-15 Thread Deepak Jaiswal


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> >

Thanks for the review. I will work on the issues and update the patch.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > hbase-handler/src/test/results/positive/external_table_ppd.q.out
> > Lines 59 (patched)
> > 
> >
> > Are there any tests for the old-style bucketing, to make sure that 
> > previously created bucketed tables still work properly?

That is a good point. Will work on it.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
> > Lines 25 (patched)
> > 
> >
> > Unnecessary change?

Yes. I was using it before, but eventually stopped. Thanks for pointing it out.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
> > Lines 850 (patched)
> > 
> >
> > missing comment?

missed cleanup.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
> > Line 1053 (original), 1051 (patched)
> > 
> >
> > If this occurs every row, I wonder if it would be better to determine 
> > the bucketing version once during initializeOp() and create some object 
> > which knows which knows which bucketing hash code method to call here

Makes sense. Let me work on this.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
> > Lines 469 (patched)
> > 
> >
> > should we validate that this is a valid bucketing version that we 
> > support?

The way I use bucketingVersion, if it is version 2, we use murmur hash, else we 
dont care and use old hashing.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java
> > Lines 639 (patched)
> > 
> >
> > Do we also need to check the bucketing type in the case that op is not 
> > a TableScan? If op is a ReduceSink or Join, would that end up being 
> > bucketingVersion 2?

The idea is to make sure all the tables involved in SMB when it happens in Map, 
then all the tables should have same bucketing version.
I was thinking that if SMB happens at reducer side, it wont matter, but it 
looks like it may. Lets talk about it in person.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/AnnotateWithOpTraits.java
> > Lines 72 (patched)
> > 
> >
> > Was this commented code for testing?

My bad again, should have been cleanedup.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
> > Lines 411 (patched)
> > 
> >
> > It seems to me a lot of the logic will treat -1 as bucketing version 1, 
> > since there are a lot of (bucketingVersion == 2 ? doVersion2 : doVersion1) 
> > statements. Where in the code would SMB be disabled because of -1 
> > bucketingVersion?

-1 is used when bucketing version for the join can't be determined, thus 
disabling SMB.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
> > Lines 187 (patched)
> > 
> >
> > Maybe make some common utility to parse/validate bucketing version, 
> > that both places can use?

Hmm. Let me look into this.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
> > Lines 198 (patched)
> > 
> >
> > Validate bucketing version number?

Same as for Table.java comment.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java
> > Lines 32 (patched)
> > 
> >
> > Docs for this UDF will probably need to mention that this uses the old 
> > hashing/bucketing scheme which and that a new one has replaced it.

Should I open a documentation JIRA to track this?


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMurmurHash.java
> > Lines 1 

Re: Apache Hive 3.0.0 Release branch cutoff

2018-04-15 Thread Lefty Leverenz
This would be a good time for contributors to document their 3.0.0 patches.

108 jiras have TODOC3.0 labels, but undoubtedly a few dozen more should
have gotten labeled since January 1st (when I stopped scanning email for
doc issues).

36 developers are the assignees for those 108 issues.  If you're one of
them, please don't hide your contributions in the code -- use the Hive wiki
to let the world know about your brilliant patches.

Thanks to those who already took care of their docs!

-- Lefty


On Mon, Apr 9, 2018 at 5:15 PM Vineet Garg  wrote:

> Hello,
>
> To keep the naming conventions for release branch I have created branch-3
> for Apache hive 3.0.0 release. Please use this branch instead of
> branch-3.0.0.
>
> Vineet G
> .
> On Apr 9, 2018, at 1:45 PM, Vineet Garg > wrote:
>
> Hello,
>
> The branch for 3.0.0 release has been cut off (branch-3.0.0).
>
> I am going to update all unresolved JIRA’s which aren’t marked blocker
> with fix version 3.0.0 to 3.1.0. Please update them if you would like to
> get your patch in 3.0.0.
>
> Thanks,
> Vineet G
>
>


[jira] [Created] (HIVE-19218) Upgrade to Hadoop 3.1.0

2018-04-15 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-19218:
---

 Summary: Upgrade  to Hadoop 3.1.0
 Key: HIVE-19218
 URL: https://issues.apache.org/jira/browse/HIVE-19218
 Project: Hive
  Issue Type: Task
Reporter: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19217) Upgrade to Hadoop 3.1.0

2018-04-15 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-19217:
---

 Summary: Upgrade to Hadoop 3.1.0
 Key: HIVE-19217
 URL: https://issues.apache.org/jira/browse/HIVE-19217
 Project: Hive
  Issue Type: Bug
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Upgrade to Hadoop 3.1.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #333: HIVE-19197: TestReplicationScenarios is flaky

2018-04-15 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/333

HIVE-19197: TestReplicationScenarios is flaky



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-19197

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/333.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #333


commit 07cb28befb39978abffe0ec270df6324ef4a22ae
Author: Sankar Hariappan 
Date:   2018-04-15T12:56:37Z

HIVE-19197: TestReplicationScenarios is flaky




---


Re: Recent change to skip tests

2018-04-15 Thread Eugene Koifman
If the goal is to optimize multiple patch submissions, instead of skipping one, 
perhaps it should take the latest patch from the jira and cancel future ones 
runs already in the queue.

On 4/13/18, 11:16 AM, "Deepak Jaiswal"  wrote:

I reopened HIVE-19077 to remove the future instance instead of the current 
one.

On 4/13/18, 11:10 AM, "Deepak Jaiswal"  wrote:

Hi,

It seems someone committed a patch to modify ptests. I see this on the 
ptest console output,


“Checking PreCommit-HIVE-Build queue...
PreCommit-HIVE-Build has the following jira(s) in queue: [18845, 19161, 
19054, 19161, 19126, 19184, 18652, 19184, 19187, 19175, 18902, 19104, 16041, 
12369, 12192, 18862, 19009, 18739, 19104, 19167, 18910, 16144, 19106, 18816, 
19133, 19162, 18986, 19191, 17645, 19186, 18609, 18469, 17824, 19048, 18252, 
18252, 19001, 18739, 18915, 19154, 19096]
Skipping ptest execution, as HIVE-18910 is scheduled in queue in the 
future too.
“

After waiting for 2 days I get to see this which does not make any 
sense. The queue is so long and then you get pushed back? Worst, I am not even 
in the queue anymore.
I cant find the JIRA which did this. So I request the developer who did 
this to revert it back and rework on it to remove the future instance from the 
queue. If you have time constraint, please assign it to me.

Regards,
Deepak